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Preface 


The use of sophisticated mathematical tools in modern finance is now common- 
place. Researchers and practitioners routinely run simulations or solve differential 
equations to price securities, estimate risks, or determine hedging strategies. 
Some of the most important tools employed in these computations are opti- 
mization algorithms. Many computational finance problems ranging from asset 
allocation to risk management, from option pricing to model calibration, can 
be solved by optimization techniques. This book is devoted to explaining how 
to solve such problems efficiently and accurately using the state of the art in 
optimization models, methods, and software. 

Optimization is a mature branch of applied mathematics. Typical optimization 
problems have the goal of allocating limited resources to alternative activities in 
order to maximize the total benefit obtained from these activities. Through 
decades of intensive and innovative research, fast and reliable algorithms 
and software have become available for many classes of optimization prob- 
lems. Consequently, optimization is now being used as an effective manage- 
ment and decision-support tool in many industries, including the financial 
industry. 

This book discusses several classes of optimization problems encountered in 
financial models, including linear, quadratic, integer, dynamic, stochastic, conic, 
and nonlinear programming. For each problem class, after introducing the rele- 
vant theory (optimality conditions, duality, etc.) and efficient solution methods, 
we discuss several problems of mathematical finance that can be modeled within 
this problem class. 

The second edition includes a more detailed discussion of mean-variance opti- 
mization, multi-period models, and additional material to highlight the relevance 
to finance. 

The book’s structure has also been clarified for the second edition; it is now 
organized in four main parts, each comprising several chapters. Part I guides 
the reader through the solution of asset liability cash flow matching using lin- 
ear programming techniques, which are also used to explain asset pricing and 
arbitrage. Part II is devoted to single-period models. It provides a thorough 
treatment of mean-variance portfolio optimization models, including derivations 
of the one-fund and two-fund theorems and their connection to the capital asset 
pricing model, a discussion of linear factor models that are used extensively 
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in risk and portfolio management, and techniques to deal with the sensitivity of 
mean-variance models to parameter estimation. We discuss integer programming 
formulations for portfolio construction problems with cardinality constraints, and 
we explain how this is relevant to constructing an index fund. The final chapters 
of Part II present a stochastic programming approach to modeling measures of 
risk other than the variance, including the popular value at risk and conditional 
value at risk. 

Part III of the book discusses multi-period models such as the iconic Kelly cri- 
terion and binomial lattice models for asset pricing as well as more elaborate and 
modern models for optimal trade execution, dynamic portfolio optimization with 
transaction costs and taxes, and asset-liability management. These applications 
showcase techniques from dynamic and stochastic programming. 

Part IV is devoted to more advanced optimization techniques. We introduce 
conic programming and discuss applications such as the approximation of covari- 
ance matrices and robust portfolio optimization. The final chapter of Part IV 
covers one of the most general classes of optimization models, namely nonlinear 
programming, and applies it to volatility estimation. 

This book is intended as a textbook for Master’s programs in financial engi- 
neering, finance, or computational finance. In addition, the structure of chapters, 
alternating between optimization methods and financial models that employ 
these methods, allows the book to be used as a primary or secondary text 
in upper-level undergraduate or introductory graduate courses in operations 
research, management science, and applied mathematics. A few sections are 
marked with a ‘*’ to indicate that the material they contain is more technical 
and can be safely skipped without loss of continuity. 

Optimization algorithms are sophisticated tools and the relationship between 
their inputs and outputs is sometimes opaque. To maximize the value from using 
these tools and to understand how they work, users often need a significant 
amount of guidance and practical experience with them. This book aims to 
provide this guidance and serve as a reference tool for the finance practitioners 
who use or want to use optimization techniques. 

This book has benefited from the input provided by instructors and students 
in courses at various institutions. We thank them for their valuable feedback 
and for many stimulating discussions. We would also like to thank the colleagues 
who provided the initial impetus for this book and colleagues who collaborated 
with us on various research projects that are reflected in the book. We especially 
thank Kathie Cameron, the late Rick Green, Raphael Hauser, John Hooker, 
Miroslav Karamanov, Mark Koenig, Masakazu Kojima, Vijay Krishnamurthy, 
Miguel Lejeune, Yanjun Li, François Margot, Ana Margarida Monteiro, Mustafa 
Pınar, Sebastian Pokutta, Sanjay Srivastava, Michael Trick, and Luís Vicente. 
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Introduction 


Overview of Optimization Models 


Optimization is the process of finding the best way of making decisions that 
satisfy a set of constraints. In mathematical terms, an optimization model is a 
problem of the form 
min f(x) 
x 


st. xe, 


(1.1) 


where f : R” > Rand ¥ CR”. 

Model (1.1) has three main components, namely the vector of decision vari- 
ables x := [x1 ee tn] € R”; the objective function f(x); and the constraint 
set or feasible region X. The constraint set is often expressed in terms of equalities 
and inequalities involving additional functions. More precisely, the constraint set 
X is often of the form 


X = {x E R” : gi(x) = bi, for i =1,...,m, and hj (x) < dj, for j =1,...,p}, 
(1.2) 
for some gi, hj : R” + R, i = 1,...,m, j =1,...,p. When this is the case, the 
optimization problem (1.1) is usually written in the form 
min f(x) 
s.t.  gi(x) = b;, for i=1,...,m 
hj(x) < dj, for j =1,...,p, 


or in the more concise form 


min f(x) 
st. g(x)=b 
h(x) <d. 


We will use the following terminology. A feasible point or feasible solution to 
(1.1) is a point in the constraint set X. An optimal solution to (1.1) is a feasible 
point that attains the best possible objective value; that is, a point x* € ¥ 
such that f(x*) < f(x) for all x € X. The optimal value of (1.1) is the value 
of the objective function at an optimal solution; that is, f(x*) where x* is an 
optimal solution to (1.1). If the feasible region ¥ is of the form (1.2) and x € X, 
the binding constraints at x are the equality constraints and those inequality 
constraints that hold with equality at x. The term active constraint is also often 
used in lieu of “binding constraint”. The problem (1.1) is infeasible if X = Ø. On 
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the other hand, (1.1) is unbounded if there exist x, E€ X, k = 1,2,..., such that 
f (xk) > —o0. 


Types of Optimization Models 


For optimization models to be of practical interest, their computational tractabil- 
ity, that is, the ability to find the optimal solution efficiently, is a critical issue. 
Particular structural assumptions on the objective and constraints of the problem 
give rise to different classes of optimization models with various degrees of 
computational difficulty. We should note that the following is only a partial classi- 
fication based on the current generic tractability of various types of optimization 
models. However, what is “tractable” in some specific context may be more 
nuanced. Furthermore, tractability evolves as new algorithms and technologies 
are developed. 


Convex optimization: These are problems where the objective f(x) is a con- 
vex function and the constraint set ¥ is a convex set. This class of 
optimization models is tractable most of the time. By this we mean that 
a user can expect any of these models to be amenable to an efficient algo- 
rithm. We will emphasize this class of optimization models throughout 
the book. 

Mixed integer optimization: These are problems where some of the variables 
are restricted to take integer values. This restriction makes the con- 
straint set ¥ non-convex. This class of optimization models is somewhat 
tractable a fair portion of the time. By this we mean that a model of this 
class may be solvable provided the user does some judicious modeling 
and has access to high computational power. 

Stochastic and dynamic optimization: These are problems involving ran- 
dom and time-dependent features. This class of optimization models 
is tractable only in some special cases. By this we mean that, unless 
some specific structure and assumptions hold, a model of this class 
would typically be insoluble with any realistic amount of computational 
power at our disposal. Current research is expected to enrich the class 
of tractable models in this area. 


The modeling of time and uncertainty is pervasive in almost every financial 
problem. The various types of optimization problems that we will discuss are 
based on how they deal with these two issues. Generally speaking, static models 
are associated with simple single-period models where the future is modeled as 
a single stage. By contrast, in multi-period models the future is modeled as a 
sequence, or possibly as a continuum, of stages. With regard to uncertainty, 
deterministic models are those where all the defining data are assumed to be 
known with certainty. By contrast, stochastic models are ones that incorporate 
probabilistic or other types of uncertainty in the data. 
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A good portion of the models that we will present in this book will be convex 
optimization models due to their favorable mathematical and computational 
properties. There are two special types of convex optimization problems that we 
will use particularly often: linear and quadratic programming, the latter being an 
extension of the former. These two types of optimization models will be discussed 
in more detail in Chapters 2 and 5. We now present a high-level description 
of four major classes of optimization models: linear programming, quadratic 
programming, mixed integer programming, and stochastic optimization. 


Linear Programming 


A linear programming model is an optimization problem where the objective is a 
linear function and the constraint set is defined by finitely many linear equalities 
and linear inequalities. In other words, a linear program is a problem of the form 


min c!x 
x 
s.t. Ax=b 
Dx >d 


for some vectors c € R”, b € R™,d € R? and matrices A € R™”*”, D € R?*”. 

The term linear optimization is sometimes used in place of linear programming. 
The wide popularity of linear programming is due in good part to the availability 
of very efficient algorithms. The two best known and most successful methods 
for solving linear programs are the simplex method and interior-point methods. 
We briefly discuss these algorithms in Chapter 2. 


Quadratic Programming 


Quadratic programming, also known as quadratic optimization, is an extension 
of linear programming where the objective function includes a quadratic term. 
In other words, a quadratic program is a problem of the form 
min ixTQx +c'x 
s.t. Ax=b 
Dx > d 


for some vectors and matrices Q € R”*”, c € R”, b € R™, de R?, A € R™*", 
D e R**”. It is customary to assume that the matrix Q is symmetric. This 
assumption can be made without loss of generality since 


x! Qx = x'Qx 


where Q = $(Q + Q'), which is clearly a symmetric matrix. 

We note that a quadratic function 4x'Qx +c!'x is convex if and only if the 
matrix Q is positive semidefinite (x'Qx > 0 for all x € R”). In this case the 
above quadratic program is a convex optimization problem and can be solved 
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efficiently. The two best known methods for solving convex quadratic programs 
are active-set methods and interior-point methods. We briefly discuss these algo- 
rithms in Chapter 5. 


Mixed Integer Programming 


A mixed-integer program is an optimization problem that restricts some or all of 
the decision variables to take integer values. In particular, a mixed integer linear 
programming model is a problem of the form 


min c!x 

s.t. Ax=b 
Dx >d 
zj EZ, j EJ 


for some vectors and matrices c € R”, b € R”, d € R?, A € R™*”", D € R?*” 
and some J C {1,...,n}. 

An important case occurs when the model includes binary variables, that is, 
variables that are restricted to take values 0 or 1. As we will see, the inclusion 
of this type of constraint increases the modeling power but comes at a cost in 
terms of computational tractability. It is noteworthy that the computational and 
algorithmic machinery for solving mixed integer programs has vastly improved 
during the last couple of decades. The main classes of methods for solving 
mixed integer programs are branch and bound, cutting planes, and a combination 
of these two approaches known as branch and cut. We briefly discuss these 
algorithms in Chapter 8. 


Stochastic Optimization 


Stochastic optimization models are optimization problems that account for ran- 
domness in their objective or constraints. The following formulation illustrates 
a generic type of stochastic optimization problem 

min E(F(x,w)) 


x 


xXEX. 


In this problem the set of decisions x must be made before a random outcome 
w occurs. The goal is to optimize the expectation of some function that depends 
on both the decision vector x and the random outcome w. A variation of this for- 
mulation, that has led to important developments, is to replace the expectation 
by some kind of risk measure o in the objective: 


min o(F(x,w)) 
xXEX. 


There are numerous refinements and variants of the above two formulations. In 
particular, the class of two-stage stochastic optimization with recourse has been 
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widely studied in the stochastic programming community. In this setting a set 
of decisions x must be made in stage one. Between stage one and stage two 
a random outcome w occurs. At stage two we have the opportunity to make 
some second-stage recourse decisions y(w) that may depend on the random 
outcome w. 

The two-stage stochastic optimization problem with recourse can be formally 
stated as 


min f(x) +E[Q(x,w)] 
xe. 


The recourse term Q(x,w) depends on the first-stage decisions x and the random 
outcome w. It is of the form 


Qw) = min glyo) e) 
ylw) € V(x, w). 


The second-stage decisions y(w) are adaptive to the random outcome w because 
they are made after w is revealed. The objective function in a two-stage stochastic 
optimization problem contains a term for the stage-one decisions and a term for 
the stage-two decisions where the latter term involves an expectation over the 
random outcomes. The intuition of this objective function is that the stage-one 
decisions should be made considering what is to be expected in stage two. 

The above two-stage setting generalizes to a multi-stage context where the 
random outcome is revealed over time and decisions are made dynamically at 
multiple stages and can adapt to the information revealed up to their stage. 


Solution to Optimization Problems 


The solution to an optimization problem can often be characterized in terms of 
a set of optimality conditions. Optimality conditions are derived from the math- 
ematical relationship between the objective and constraints in the problem. Sub- 
sequent chapters discuss optimality conditions for various types of optimization 
problems. In special cases, these optimality conditions can be solved analytically 
and used to infer properties about the optimal solution. However, in many cases 
we rely on numerical solvers to obtain the solution to the optimization models. 

There are numerous software vendors that provide solvers for optimization 
problems. Throughout this book we will illustrate examples with two popular 
solvers, namely Excel Solver and the MATLAB®-based optimization modeling 
framework CVX. Excel and MATLAB files for the examples and exercises in the 
book are available at: 


www.andrew.cmu.edu/user/jfp/OIFbook/ 


Both Excel Solver and CVX enable us to solve small to medium-sized problems 
and are fairly easy to use. There are far more sophisticated solvers such as the 
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commercial solvers IBM®-ILOG® CPLEX®, Gurobi, FICO® Xpress, and the ones 
available via the open-source projects COIN-OR or SCIP. 

Optimization problems can be formulated using modeling languages such as 
AMPL, GAMS, MOSEL, or OPL. The need for these modeling languages arises 
when the size of the formulation is large. A modeling language lets people use 
common notation and familiar concepts to formulate optimization models and 
examine solutions. Most importantly, large problems can be formulated in a 
compact way. Once the problem has been formulated using a modeling language, 
it can be solved using any number of solvers. A user can switch between solvers 
with a single command and select options that may improve solver performance. 


Financial Optimization Models 


In this book we will focus on the use of optimization models for financial problems 
such as portfolio management, risk management, asset and liability management, 
trade execution, and dynamic asset management. Optimization models are also 
widely used in other areas of business, science, and engineering, but this will not 
be the subject of our discussion. 


Portfolio Management 


One of the best known optimization models in finance is the portfolio selection 
model of Markowitz (1952). Markowitz’s mean-variance approach led to major 
developments in financial economics including Tobin’s mutual fund theorem 
(Tobin, 1958) and the capital asset pricing model of Treynor , Sharpe (1964), 
Lintner (1965), and Mossin (1966). Markowitz was awarded the Nobel Prize in 
Economics in 1990 for the enormous influence of his work in financial theory and 
practice. The gist of this model is to formalize the principle of diversification 
when selecting a portfolio in a universe of risky assets. As we discuss in detail in 
Chapter 6, Markowitz’s mean-variance model and a wide range of its variations 
can be stated as a quadratic programming problem of the form 


min iy -x'Vx-— p'x 
Ax=b (1.3) 
Dx > d. 


The vector of decision variables x in model (1.3) represents the portfolio holdings. 
These holdings typically represent the percentages invested in each asset and 
thus are often subject to the full investment constraint 1'x = 1. Other common 
constraints include the long-only constraint x > 0, as well as restrictions related 
to sector or industry composition, turnover, etc. The terms x'Vx and yp! x in 
the objective function are respectively the variance, which is a measure of risk, 


* “Toward a theory of market value of risky assets”. Unpublished manuscript, 1961. 
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and the expected return of the portfolio defined by x. The risk-aversion constant 
y > 0 in the objective determines the tradeoff between risk and return of the 
portfolio. 


Risk Management 


Risk is inherent in most economic activities. This is especially true of financial 
activities where results of decisions made today may have many possible different 
outcomes depending on future events. Since companies cannot usually insure 
themselves completely against risk, they have to manage it. This is a hard task 
even with the support of advanced mathematical techniques. Poor risk manage- 
ment led to several spectacular failures in the financial industry in the 1990s 
(e.g., Barings Bank, Long Term Capital Management, Orange County). It was 
also responsible for failures and bailouts of a number of institutions (e.g., Lehman 
Brothers, Bear Stearns, AIG) during the far more severe global financial crisis of 
2007-2008. Regulations, such as those prescribed by the Basel Accord (see Basel 
Committee on Banking Supervision, 2011), mandate that financial institutions 
control their risk via a variety of measurable requirements. The modeling of reg- 
ulatory constraints as well as other risk-related constraints that the firm wishes 
to impose to prevent vulnerabilities can often be stated as a set of constraints 


RM(x) < b. (1.4) 


The vector x in (1.4) represents the holdings in a set of risky securities. The 
entries of the vector-valued function RM (x) represent one or more measures of 
risk and the vector b represents the acceptable upper limits on these measures. 
The set of risk management constraints (1.4) may be embedded in a more 
elaborate model that aims to optimize some kind of performance measure such 
as expected investment return. 

In Chapter 2 we discuss a linear programming model for optimal bank planning 
under Basel III regulations. In this case the components of the function RM(x) 
are linear functions of x. In Chapter 11 we discuss more sophisticated risk 
measures such as value at risk and conditional value at risk that typically make 
RM(x) a nonlinear function of x. 


Asset and Liability Management 


How should a financial institution manage its assets and liabilities? A static 
model, such as the Markowitz mean-variance portfolio selection model, fails 
to incorporate the multi-period nature of typical liabilities faced by financial 
institutions. Furthermore, it penalizes returns both above and below the mean. 
A multi-period model that emphasizes the need to meet liabilities in each 
period for a finite (or possibly infinite) horizon is often more appropriate. Since 
liabilities and asset returns usually have random components, their optimal 
management requires techniques to optimize under uncertainty such as stochastic 
optimization. 
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1.4 


Overview of Optimization Models 


We discuss several asset and liability management models in Chapters 3, 16, 
and 17. A generic asset and liability management model can often be formulated 
as a stochastic programming problem of the form 


max E(U(x)) 


Fx =L (1.5) 
Dx > 0. 


The vector x in (1.5) represents the investment decisions for the available assets 
at the dates in the planning horizon. The vector L in (1.5) represents the 
liabilities that the institution faces at the dates in the planning horizon. The 
constraints Fx = L, Dx > 0 represent the cash flow rules and restrictions 
applicable to the assets during the planning horizon. The term U(x) in the 
objective function is some appropriate measure of utility. For instance, it could 
be the value of terminal wealth at the end of the planning horizon. In general, 
the components F,L,D are discrete-time random processes and thus (1.5) is 
a multi-stage stochastic programming model with recourse. In Chapter 3 we 
discuss some special cases of (1.5) with no randomness. 


Notes 


George Dantzig was the inventor of linear programming and author of many 
related articles as well as a classical reference on the subject (Dantzig, 1963). A 
particularly colorful and entertaining description of the diet problem, a classical 
linear programming model, can be found in Dantzig (1990). 

Boyd and Vandenberghe (2004) give an excellent exposition of convex opti- 
mization appropriate for senior or first-year graduate students in engineering. 
This book is freely available at: 


www. stanford. edu/~boyd/cvxbook/ 


Ragsdale (2007) gives a practical exposition of optimization and related 
spreadsheet models that circumvent most technical issues. It is appropriate for 
senior or Master’s students in business. 


2.1 


Linear Programming: Theory 
and Algorithms 


Linear programming is one of the most significant contributions to computational 
mathematics made in the twentieth century. This chapter introduces the main 
ideas behind linear programming theory and algorithms. It also introduces two 
easy-to-use solvers. 


Linear Programming 


A linear program is an optimization problem whose objective is to minimize or 
maximize a linear function subject to a finite set of linear equality and linear 
inequality constraints. By flipping signs if necessary, a linear program can always 
be written in the generic form: 


min c!x 
x 
s.t. Ax=b 
Dx>d 


for some vectors and matrices c € R",b € R™,d € RP, A € R™*",D e R?*”. 
The terms linear programming model or linear optimization model are also used 
to refer to a linear program. We will use these terms interchangeably throughout 
the book. 

The following two simplified portfolio construction examples illustrate the use 
of linear programming as a modeling tool. 


Example 2.1 (Fund allocation) You would like to allocate $80,000 among four 
mutual funds that have different expected returns as well as different weights in 
large-, medium- and small-capitalization stocks. 


Capitalization Fund1 Fund2 Fund3 Fund 4 


Large 50% 30% 25% 60% 
Medium 30% 10% 40% 20% 
Small 20% 60% 35% 20% 


Exp. return 10% 15% 16% 8% 
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The allocation must contain at least 35% large-cap, 30% mid-cap, and 15% 
small-cap stocks. Find an acceptable allocation with the highest expected return 
assuming you are only allowed to hold long positions in the funds. 


This problem can be formulated as the following linear programming model. 


Linear programming model for fund allocation 


Variables: 
xi: amount (in $1000s) invested in fund i fori =1,...,4. 
Objective: 
max 0.107; + 0.1529 + 0.16x3 + 0.0824. 
Constraints: 
0.5021 + 0.3022 + 0.25%3 + 0.6024 > 0.35% 80 (large-cap) 
0.302, + 0.10z%2 + 0.4023 + 0.2024 > 0.30% 80 (mid-cap) 
0.20x, + 0.6022 + 0.3523 + 0.2024 > 0.15% 80 (small-cap) 
%+%2+23+24 = 80 (money to allocate) 
%1,-.-,%4 > O (long-only positions). 


Example 2.2 (Bond allocation) A bond portfolio manager has $100,000 to 
allocate to two different bonds: a corporate bond and a government bond. These 
bonds have the following yield, risk level, and maturity: 


Bond Yield Risk level Maturity 
Corporate 4% 2 3 years 
Government 3% 1 4 years 


The portfolio manager would like to allocate the funds so that the average risk 
level of the portfolio is at most 1.5 and the average maturity is at most 3.6 years. 
Any amount not invested in the bonds will be kept in a cash account that is 
assumed to generate no interest and does not contribute to the average risk level 
or maturity. In other words, assume cash has zero yield, zero risk level, and zero 
maturity. 

How should the manager allocate funds to the two bonds to maximize yield? 
Assume the portfolio can only include long positions. 


This problem can be formulated as the following linear programming model. 


Linear programming model for bond allocation 
Variables: 


zı, £2: amounts (in $1000s) invested in the corporate and government 
bonds respectively. 
Objective: 


max 47, + 329. 
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Constraints: 
zı +z2 < 100 (total funds) 
2 
w < 1.5 (risk level) 
Zae < 3.6 (maturity) 
z1,£t2 > 0 (long-only positions) 


or equivalently 


max 4gzı + 3£2 


s.t. 
zı +z2 < 100 (total funds) 
2xrı +z2 < 150 (risk level) 
3xı +422 < 360 (maturity) 
1,22 > 0 (long-only positions). 


The linear programming model in Example 2.1 can be written more concisely 
using matrix—vector notation as follows: 


max r'x 
st. Ax=b 
Dx >d 
x 2 0, 
ne 0.5 0.3 0.25 0.6 
where r = aie ,A=[1 1 1 1),b=80,D=]03 0.1 0.4 0.2}, and 
0.08 0.2 0.6 0.35 0.2 
28 
d= | 24]. 
12 
Likewise, the linear programming model in Example 2.2 can be written as 
max r'x 
st. Ax<b 
x > 0, 
4 1 1 100 
fore = f], A= 2 1),and b= |150). 
3 4 360 
A linear programming model is in standard form if it is written as follows: 
min c!x 
s.t. Ax=b 
x > 0. 


The standard form is a kind of formatting convention that is used by some 
solvers. It is also particularly convenient to describe the most popular algorithms 
for solving linear programming, namely the simplex and interior-point methods. 
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The standard form is not restrictive. Any linear program can be rewritten in 
standard form. In particular, inequality constraints (other than non-negativity) 
can be rewritten as equality constraints after the introduction of a so-called slack 
or surplus variable. For instance, the linear program from Example 2.2 can be 
written as 


max 47, +329 


St 
zı + T2 + T3 = 100 
2%, + T2 + £4 = 150 
32, + 4x9 +25 = 360 
T1, L2, L3, €4, T5 > 0 


More generally, a linear program of the form 


min c!x 
st. Ax<b 
x>0 
can be rewritten as 
min c!x 
st. Ax+s=b 
x,s > 0. 


It can then be rewritten, using matrix notation, in the following standard form: 


‘jee 


Unrestricted variables can be expressed as the difference of two new non-negative 
variables. For example, consider the linear program 

min c'x 

s.t. Ax<b. 


The unrestricted variable x can be replaced by u — v where u,v > 0. Hence the 
above linear program can be rewritten as 
min c'(u-— v) 
s.t. A(u—v)<b 
u,v > 0. 
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It can also be rewritten, after adding slack variables and using matrix notation, 
in the following standard form: 


= 
c u 
min —c v 
0 s 


Graphical Interpretation of a Two-Variable Example 


Banks need to consider regulations when determining their business strategy. In 
this section, we consider the Basel III regulations (Basel Committee on Bank- 
ing Supervision, 2011). We present a simplified example following the paper of 
Pokutta and Schmaltz (2012). Consider a bank with total deposits D and loans 
L. The loans may default and the deposits are exposed to early withdrawal. The 
bank holds capital C in order to buffer against possible default losses on the 
loans, and it holds a liquidity reserve R to buffer against early withdrawals on 
the deposits. The balance sheet of the bank satisfies L+ R = D+C. Normalizing 
the total assets to 1, we have R = 1 — L and C = 1 — D. Basel III regulations 
require banks to satisfy four minimum ratio constraints in order to buffer against 
different types of risk: 


C 
Capital ratio: T >r 
Leverage ratio: C > r2 


Liquidity coverage ratio: 
Net stable funding ratio: -r NA 


where the ratios r1, r2,T3, T4, @ are computed for each bank based on the riskiness 
of its loans and the likelihood of early withdrawals on deposits. To illustrate, 
consider a bank with rı = 0.3, rg = 0.1, r3 = 0.25, r4 = 0.7, a = 0.3. Expressing 
the four ratio constraints in terms of the variables D and L, we get 


D+0.3L<1 
D<0.9 
0.25D+L<1 
0.7D +0.7L <1. 


Figure 2.1 displays a plot of the feasible region of this system of inequalities in 
the plane (D, L). 

Given this feasible region, the objective of the bank is to maximize the margin 
income mpD+m_L that it makes on its products; where mp is the margin that 
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4 Loans L 


Leverage Ratio 


| — Liquidity Coverage Ratio 


Feasible region 


Net Stable Funding Ratio 


Objective 


Deposits D 
> 


Figure 2.1 Basel III regulations 


the bank makes on its deposits and mz is the margin charged on its loans. For 
example, if mp = 0.02 and mz = 0.03, the best solution that satisfies all the 
constraints corresponds to the vertex D = 0.571, L = 0.857 on the boundary of 
the feasible region, at the intersection of the lines 0.25D + L = 1 and 0.7D + 
0.7L = 1. This means that the bank should have 57.1% of its liabilities in deposits 
and 42.9% in capital, and it should have 85.7% of its assets in loans and the 
remaining 14.3% in liquidity reserve. The fact that an optimal solution occurs 
at a vertex of the feasible region is a property of linear programs that extends 
to higher dimensions than 2: To find an optimal solution of a linear program, 
it suffices to restrict the search to vertices of the feasible region. This geometric 
insight is the basis of the simplex method, which goes from one vertex of the 
feasible region to an adjacent one with a better objective value until it reaches 
an optimum. An algebraic description of the simplex method that can be coded 
on a computer is presented in Section 2.7.1. 


Numerical Linear Programming Solvers 


There are a variety of both commercial and open-source software packages for 
linear programming. Most of these packages implement the algorithms described 
in Section 2.7 below. Next we illustrate two of these solvers by applying them to 
Example 2.1. 


Excel Solver 


Figure 2.2 displays a printout of an Excel spreadsheet implementation of the 
linear programming model for Example 2.1 as well as the dialog box obtained 
when we run the Excel add-in Solver. The spreadsheet model contains the three 
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components of the linear program. The decision variables are in the range B4: E4. 
The objective is in cell F3. The left- and right-hand sides of the equality con- 
straint are in the cells F4 and H4 respectively. Likewise, the left- and right- 
hand sides of the three inequality constraints are in the ranges F8:F10 and 
H8:H10 respectively. These components are specified in the Solver dialog box. 
In addition, the Solver options are used to indicate that this is a linear model 
and that the variables are non-negative. 


> A 8 [d D E F EHK H 
1 
2 Fund Fund2 Fund3 Fund4 Total 
3 $$ to bo allocated 
4 = 8 
5 s Onti 
6 Minimum ons 
7 Capitalizaton fund1 Fund2 Fund3 Fund4 in porttoto Req'd Amnt 
8 largo Os 03 025 06 >= =0.35'H$4 Max Time: 100 | seconds Load Model... 
9 modium 03 01 04 02 >e =0.0°HS4 LESS ——_______—_ 
10 smali o2 os oss o2 >= =0.15'H$4 eons 100 ere 
12 p 
ni Solver Parameters Pedon: 0.000001 
14 p 
15| Set Target Cell $FS3 | Das (C Sove ) Tolerance: 5 % 
16 c 
K | EqualTo: ©) Max OMin OvalueOf: (9 A — Convergence: 0.0001 
19) = 
4 i : Coal 7 i 
za By Changing Cells _ Options... FÍ Assume Linear Model [C Use Automatic Scaling 
P| : mC 
a SBS4:SES4 (SC Guess (Reset All ) F Assume Non-Negative C Show Iteration Results 
244 Subject to the Constraints: C Help) Estimates Derivatives Search 
1 ( 5 ears 
= j| SFS4 = SHS4 Add @ Tangent @ Forward @ Newton 
oA SFS8:SFS10 >= SHS8:SHS10 O Quadratic O Central O Conjugate 
29) Change 
30) 
31) P_i ( Cancel) 


Figure 2.2 Spreadsheet implementation and the Solver dialog box for the fund 
allocation model 


MATLAB CVX 


Figure 2.3 displays a CVX script for the same problem. The script can be run 
provided the freely available CVX toolbox is installed. 


Either Excel Solver or MATLAB CVX find the following optimal solution to 
the problem in Example 2.1: 
0.0000 
N 12.6316 
46.3158 | ’ 
21.0526 


and the corresponding optimal objective value 10.9895 (recall that the units are 
in $1000s). 


Sensitivity Analysis 


In addition to the optimal solution, the process of solving a linear program also 
generates some interesting sensitivity information via the so-called shadow prices 
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File Edit Text Go Cell Tools Debug Desktop Window 
z 
xa «1 [OUSE AIO &SB- A 
pa -fho]+ +m] Bw o 
1 % Matlab CVX code for the fund allocation problen| 
2- n=4; 
i= ry = [0.10;0.15;0.16;0.08] ; 
4- A=[1l111); 
5 - b= 80; 
6 - D = [0.5 0.3 0.25 0.6; 
z 0.3 0.1 0.4 0.2; 
8 0.2 0.6 0.35 0.2) ; 
j- d = [28;24;12] ; 
10 
11 - cvx_begin 
12 - variables x(n) ; 
13 - maximize (r'*x) 
14 - Atx == b>} 
15 = D*x >= d ; 
16 - x >= zeros(n,1l) ; 
17 - cvx_end 
18 
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or dual values associated with the constraints. Assume that the constraints of a 
linear program, and hence the shadow prices, are indexed by i = 1,...,m. The 
the ith constraint has the following sensitivity interpretation: 


shadow price yž of 


B CVX code for the fund allocation model 


If the right-han 


certain range. 


d side of the ith constraint changes by A, then the optimal 
value of the linear program changes by A - y* as long as A is within a 


Both Excel Solver and MATLAB CVX compute the shadow prices implicitly. 
To make this information explicit in Excel Solver we request a sensitivity report 


after running it as 


shown in Figure 2.4. 


fe 
ee lb bl La 


13 
14| 


A c D E E G H 
Fund 1 Fund 2 Fund3 Fund 4 Total 

Return 10.9895 $$ to be allocated 

$$ invested A| 80.0000 = | 80.0000 
Amount Minimum 

Capitalization Fund 1 Fund 2 Fund 3 Fund 4 in portfolio Req'd Amnt 

large 0.50 0.30 0.25 0.60 28.0000 >= 28.0000 

medium 0.30 0.10 0.40 0.20 24.0000 >= 24.0000 

small 0.20 0.60 0.35 0.20 28.0000 >= 12.0000 

Solver Results _ 


Solver found a solution. All constraints and 
optimality conditions are satisfied. 
Reports 


| Answer 


© Keep Solver Solution i 


O Restore Original Values 


| Sensitivity | 
limits 


(Help } (Save Scenario...) ( Cancel )é OK p) 


Figure 2.4 Request 


ing sensitivity report in Solver 


Figure 2.5 displays the sensitivity report for Example 2.1. 
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Final Reduced Objective Allowable Allowable 

Cell Name Value Cost Coefficient Increase Decrease 
$B$4 $$ invested Fund 1 O -0.00263158 0.1 0.00263158 1.00E+30 
$C$4 $$ invested Fund 2 12.63157895 0 0.15 0.0166667 0.00142857 
$D$4 $$ invested Fund 3 46.31578947 0 0.16 0.00166667 0.00625 
SES4 $$ invested Fund 4 21.05263158 0 0.08 0.01 0.00357143 

Constraints 

Final Shadow Constraint Allowable Allowable 

Cell Name Value Price R.H. Side Increase Decrease 
$F$4 $$ invested Total 80.0000 0.22 80 21.0526 6.31579 
$F$8 large in portfolio 28.0000 -0.231579 28 6 6.66667 
$F$9 medium in portfolio 24.0000 -0.00526316 24 3.42857 14.6667 
SFS10 small in portfolio 28.0000 (0) 12 16 1.00E+30 


Figure 2.5 Sensitivity report 


The values y* can be found in the column labeled “Shadow Price”. In addi- 
tion, the “Allowable Increase” and “Allowable Decrease” columns indicate the 
range of change for each right-hand side of a constraint where the sensitivity 
analysis holds. For example, if the right-hand side of the large-capitalization 
constraint 


0.5x1 + 0.3x2 + 0.25x3 + 0.6x4 > 28 


changes from 28 to 28+A, then the optimal value changes by —0.231579-A. This 
holds provided A is within the allowable range [—6.6666, 6]. If the requirement 
on large-cap stocks is reduced from 35% to 30%, the change in right-hand side is 
A = —0.05* 80 = —4, which is within the allowable range. Therefore the optimal 
objective value increases by —0.231579 - (—4) = 0.926316. Because our units are 
in $1000, this means that the expected return on an optimal portfolio would 
increase by $926.32 if we relaxed the constraint on large-cap stocks by 5%, from 
35% to 30%. 

The shadow prices of the non-negativity constraints are the “Reduced Cost” 
displayed in the initial part of the sensitivity report. This is also the convention 
for more general lower and upper bounds on the decision variables. Observe that 
in Example 2.1 the reduced costs of the non-zero variables are zero. The reduced 
costs also have a deeper meaning in the context of the simplex algorithm for 
linear programming as described in Section 2.7.1 below. 

A linear programming model is non-degenerate if all of the allowable increase 
and allowable decrease limits are positive. The above linear programming model 
is non-degenerate. 

In CVX this information can also be obtained by including a few additional 
pieces of code to save the dual information in the dual variables y,z as shown 
in Figure 2.6. 
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File Edit Text Go Cell Tools Debug Desktop Window 


A op > JOSM RAJU &SB- A 
A 
EE - fro] + + [i]s Be @ 
1 $ Matlab CVX code for the fund allocation problem 
2 $ Request dual variables 
3- n=4; 
4- r= [0.10;0.15;0.16;0.08] ; 
5- A= [1111]} 
6- b = 80; 
gil- D = [0.5 0.3 0.25 0.6; 
8 0.8 0.10.4 0.2; 
9 0.2 0.6 0.35 0.2] ; 
10 - d = [28;24;12] ; 
11 
12 - cvx_begin 
13 - variables x(n) ; 
14 - dual variable y ; 
15 - dual variable z ; 
16 - maximize (r'*x) 
17- y: Atx =e b ; 
18 - z: D*x >= d; 
19 - x >= zeros(n,1) ; 
20 - cvx_end 


Figure 2.6 MATLAB CVX code with dual variables 


—0.231579 
Both solvers yield the following dual values: y* = 0.22, z* = |—0.005263 
0 
We note that some solvers may flip the sign of the dual values. In particular, 
0.231579 
the output of the above CVX code yields the values —0.22 and | 0.005263} . It is 
0 


important to be mindful of this subtlety when interpreting the dual information. 
The ambiguity can be easily resolved by thinking in terms of sensitivity analysis. 
In this particular example, it is clear that the shadow price of the first constraint 
should be non-negative as more capital should lead to a higher return. Likewise, 
it is clear that the shadow prices of the other constraints should be non-positive 
as more stringent diversification constraints, e.g., higher percentage in large cap, 
reduces the set of feasible portfolios and hence can only lead to portfolios with 
return less than or equal to the optimal return of the original problem. 


*Duality 


Every linear program has an associated dual linear programming problem. The 
properties of these two linear programs and how they are related to each other 
have deep implications. In particular, duality enables us to answer the following 
kinds of questions: 


e Can we recognize an optimal solution? 
e Can we construct an algorithm to find an optimal solution? 
e Can we assess how suboptimal a current feasible solution is? 


The attentive reader may have noticed that dual variables were already men- 
tioned in Section 2.4 when discussing sensitivity analysis with CVX. This is not a 
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coincidence. There is a close connection between duality and sensitivity analysis. 
The vector of shadow prices of the constraints of a linear program corresponds 
precisely to the optimal solution of its dual. 

Consider the following linear program in standard form, which we shall refer 
to as the primal problem: 


min c!x 
s.t. Ax=b (2.1) 
x > 0. 


The following linear program is called the dual problem: 


max bly 


2.2 
st. Aly<e. ee) 


Sometimes it is convenient to rewrite the constraints in the dual problem as 
equality constraints by means of slack variables. That is, problem (2.2) can also 
be written as 


max bly 
st. Alyt+ts=c (2.3) 
s>0. 


There is a deep connection between the primal and dual problems. The next 
result follows by construction. 


Theorem 2.3 (Weak duality) Assume x is a feasible point for (2.1) and y is a 
feasible point for (2.2). Then 


bly < c'x. 


Proof Under the assumptions on x and y it follows that 


bly = (Ax)'y = (Aly)'x < c'x. 
The following (not so straightforward) result also holds. 


Theorem 2.4 (Strong duality) Assume one of the problems (2.1) or (2.2) is 
feasible. Then this problem is bounded if and only if the other one is feasible. In 
that case both problems have optimal solutions and their optimal values are the 
same. 


We refer the reader to Bertsimas and Tsitsiklis (1997) or Chvátal (1983) for 
a proof of Theorem 2.4. This result is closely related to the following classical 
properties of linear inequality systems. 


Theorem 2.5 Assume A € R”*” and b € RR”. In each of the following cases 
exactly one of the systems (I) or (II) has a solution but not both. 


(a) Farkas’s lemma 


Aly <0, b'y <0. (I1) 
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(b) Gordan’s theorem 


Ax=0, x20, (1) 
Aly >0. (II) 
(c) Stiemke’s theorem 
Ax=0, x>0, (1) 
A'y=0. (II) 


The equivalence between Theorems 2.4 and 2.5 is explored in Exercises 2.11 
and 2.12. 

We next present a derivation of the dual problem via the so-called Lagrangian 
function. This derivation has the advantage of introducing an important concept 
that we will encounter again in later chapters. Associated with the optimization 
problem (2.1) consider the Lagrangian function defined by 


L(x, y,s) := c'x + y' (b — Ax) — s'x. 


The constraints of (2.1) can be encoded using the Lagrangian function via the 
following observation: For a given vector x 


max L(x, y,s) = 
s>0 


c'x if Ax=b and x> 0 
+oo otherwise. 


Therefore the primal problem (2.1) can be written as 


min max L(x, y,s). (2.4) 
xX Zo 
On the other hand, observe that L(x, y,s) = bly + (c — A'y —s)'x. Hence for 
a given pair of vectors (y, s) 
bly if Alyt+s=c 
nL _ 
vx (58) { —oo otherwise. 


The dual problem is obtained by flipping the order of the min and max operations 
in (2.4). Indeed, observe that the dual problem (2.3) can be written as 


max min L(x, y,s). 
ys x 
s>0 


A similar procedure can be applied to obtain the dual of a linear program that 
is not necessarily in standard form. For example, the primal problem 


min c!x 
s.t. Ax>b (2.5) 
x>0 


can be written as 


min max L(x,y,s 
x y>0,s>0 ( Y> ); 


2.6 


2.6 *Optimality Conditions 23 


for L(x, y,s) =c'x+y'(b— Ax) —s'x. In this case the dual problem is 


max min L(x,y,s 
y>0,s>0 x ( Y, Js 


and can be rewritten as 


mo bly 
st. Aly<c (2.6) 
y 20. 


Again the weak and strong duality properties hold for the pair of problems (2.5) 
and (2.6). 
Consider the linear programming model of Example 2.1, namely 


max r'x 
x 


st. Ax=b 
Dx >d 
x > 0. 


(2.7) 


We give a derivation for its dual. Observe that (2.7) can be recast as 


max min L(x,y,w,s) 
y.w,s 
2 w>0,s>0 


for 
L(x, y,w,s) = r'x +y' (b — Ax) + w' (Dx — d) + s"x 


=b'y -d'w +x! (r— A'y+D'w +s). 


It follows that its dual min max L(x, y,s,Z) can be rewritten as 
w>0,s>0 


st. Aly—D'wer (2.8) 


An alternative way to obtain the dual (2.8) is to rewrite (2.7) in standard form 
and derive its standard dual. The latter turns out to be equivalent to (2.8). (See 
Exercise 2.6.) 


*Optimality Conditions 


Consider again the linear programming problem (2.1). A powerful consequence 
of Theorem 2.4 is a set of optimality conditions that completely characterize the 
solutions to both (2.1) and (2.2). 
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Theorem 2.6 (Optimality conditions) The vectors x € R” and (y,s) E€ R™ xR” 
are respectively optimal solutions to (2.1) and (2.3) if and only if they satisfy the 
following system of equations and inequalities: 


Aly+s = c 
Ax = b 
x,s > O U2) 
vs; = 0, 7=1,...,n 


The equations zis; = 0 are known as the complementary slackness condi- 
tions. They imply that, if a dual constraint (A'y); < c; holds strictly (that is, 
(Aly); < ci), then the corresponding primal variable x; must be 0. And con- 
versely, if x; > 0, the corresponding dual constraint is tight, that is, (A'y); = cj. 

The optimality conditions (2.9) provide an avenue for constructing algorithms 
to solve the linear programming problems (2.1) and (2.3). To lay the groundwork 
for discussing them, we next state two interesting results concerning the optimal 
solutions of a linear programming problem and its dual. 


Theorem 2.7 (Strictly complementary solutions) Assume A € R™*” is full row 
rank, b € R™, and c € R” are such that both (2.1) and (2.2) are feasible. Then 
there exist optimal solutions x* to (2.1) and (y*,s*) to (2.3) such that 


x*+s* > 0. 
For a matrix A and a subset B of its columns, let A g denote the submatrix of 


A containing the columns in B. For a square non-singular matrix D, the notation 
D`" stands for (D~')'. 


Theorem 2.8 (Optimal basic feasible solutions) Assume A € R™*” is full row 
rank, b € R™, and c € R” are such that both (2.1) and (2.2) are feasible. Then 
there exists a partition BUN = {1,...,n} with |B| =m and Ag non-singular, 
such that 

Xp = Aņ;'b, xy =0, yr= Az ce 


are optimal solutions to (2.1) and (2.2) respectively. 


*Algorithms for Linear Programming 


We next sketch the two main algorithmic schemes for solving linear programs, 
namely the simplex method and interior-point methods. Our discussion of these 
two important topics is only intended to give the reader a basic understanding 
of the main solution techniques for linear programming. For a more detailed and 
thorough discussion of these two classes of algorithms, see Bertsimas and Tsit- 
siklis (1997), Boyd and Vandenberghe (2004), Chvatal (1983), Renegar (2001), 
and Ye (1997). 

We follow the usual convention of assuming that the problem of interest is in 
standard form as in (2.1) and (2.3) and A has full row rank. 


2.7.1 


2.7 *Algorithms for Linear Programming 25 


The Simplex Method 


One of the most popular algorithms for linear programming is the simplex 
method. It generates a sequence of iterates that satisfy Ax = b,x > 0, A'y+s = 
c and zis; = 0, with i = 1,...,n. Each iteration of the algorithm aims to make 
progress towards satisfying s > 0. Theorem 2.6 guarantees that the algorithm 
terminates with an optimal solution when this goal is attained. The dual simplex 
method is a variant that generates a sequence of iterates satisfying Ax = b, 
Aly+s=c,s > 0, and z;s; = 0, for i = 1,...,n. Each iteration of the 
algorithm aims to make progress towards satisfying x > 0. 

The simplex method relies on the property stated in Theorem 2.8. The gist of 
the method is to search for an optimal basis; that is, a subset B C {1,...,n} as 
in Theorem 2.8. To motivate and describe the algorithm we next introduce some 
terminology and key observations. 

A basis is a subset B C {1,...,n} such that |B] = m and Ap is a non- 
singular matrix. A basis B defines the basic solution X = (Xg, Xy) where Xg = 
Aj'b, Xy = 0. Observe that X solves the system of equations Ax = b. The 
vector X is a basic feasible solution if in addition x > 0. A basis B also defines 
the reduced cost € = c — ATAZ'c B. The following fact suggests the main idea 
for the simplex method. 


Proposition 2.9 Assume B C {1,...,n} is a basis. Let X and € be respectively 
the corresponding basic solution and reduced cost vector. Ifx > 0 and © > 0 then 
X is an optimal solution to (2.1). Furthermore, in this case y = Az te is an 
optimal solution to (2.2). 


An optimal basis is a basis that satisfies the conditions x > 0 and € > 0 stated 
above. Given a basis B that is not optimal, the main idea of the simplex method 
is to generate a better basis by replacing an index from B. To that end, a possible 
avenue is as follows. Suppose B is a basis with a basic feasible solution x. If B 
is not an optimal basis, then Z; < 0 for some j ¢ B. Thus for a > 0 the point 
x(a) defined by 


satisfies 
c'x(a) =c'X + ag; < c'3. 

Hence we can get a point with better (lower) objective value than the current 
basic feasible solution z. We would like this new point to remain feasible. Unless 
the problem is unbounded, there is a length a* > 0 that makes one of the current 
basic components @ of x drop to zero while keeping all of them non-negative. 
When a* > 0, a basis with a better basic feasible solution can be obtained by 
replacing @ with j. The simplex method modifies the basis in this way, even in the 
degenerate case when a* = 0, which may occur in some iterations. Algorithm 2.1 
gives a formal description of the simplex method. 
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Algorithm 2.1 The simplex method 


1: start with a basis B C {1,...,n} such that X is a basic feasible solution 
2: while c 7 0 do 
3: choose an index j such that ĉj < 0 
4 compute u = Az A; 
5 if u < 0 then HALT; the problem is unbounded end if 
6 let a* := min ~ = Z 
tui >0 Uj Ue 
7 form a new basis by replacing L with j 
8 update the basic feasible solution by replacing X with x(a*) 
9: end while 


Observe that the basic feasible solution x and the reduced cost € corresponding 
to a basis B satisfy Xy = 0 and €g = 0 where N = {1,...,n} \ B. Hence the 
simplex method only needs to keep track of Xg and Cy. We next illustrate the 
simplex method in the linear programming model from Example 2.2. If we start 
with the initial basis B = {3,4,5} the algorithm proceeds as follows. 


Iteration 1: B = {3,4,5}, 3g = [73 Z4 ēsļ' = [100 150 360]', ey = 
[@1 a = [-4 —3] Z 0. Choose j = 1 as the new index to enter 
the basis. Compute u = [ug u4 us] = Ap A; = [1 2 3|" and 

z, 1 $ 
a* := min Ž = = = “4 Hence ¢ = 4 is the index leaving the 
tui >0 Uj 2 UWA 
basis. Update the poe and be fle solution to B = {3,1,5} and 
%p = |z; zı zs] =[25 75 135)" 

Iteration 2: B = {3,1,5}, 5 = [z 7 @5]' = [25 75 135]', en = 
[@2 Za] = [-1 0] Z 0. Choose j = 2 as the new index to enter 
the basis. Compute u = [us u us|" = A5 Ay = [1/2 1/2 5/2)" 

9 = 
and a* := min es 5 Oe = “3 Hence ¢ = 3 is the index leaving the 
tui >0 Ui 1/2 U3 
basis. Update the basis and basic feasible solution to B = {2,1,5} and 
Zp =(% zı as|'=[50 50 10)" 

Iteration 3: B = {2,1,5}, Zg = [ē2 % Zs)’ = [50 50 10)’, ey = 

[c3 AN =Z [2 i > 0. Hence B is an optimal basis and 


ł=[50 50 0 0 10)" 
is an optimal solution. 


Notice how, geometrically, the simplex iterations move from one vertex of the 
feasible region to an adjacent vertex until an optimum solution is identified. See 
Figure 2.7. 
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100 \ 


| i | i | J | `s | | 
(0,0) 75,0) 100 120 
iter 1 iter 2 


Figure 2.7 Simplex iterations 


Dual Simplex Method 


The above version of the simplex method is a primal version that generates 
primal feasible iterates and aims for dual feasibility. The dual simplex method 
generates dual feasible iterates and aims for primal feasibility. The logic behind 
the algorithm is similar. Suppose B is a basis with reduced cost ¢ > 0. This 
means that y = Az cB is a dual feasible solution with slack € = c — Aly > 0. 
If B is not an optimal basis, then Z; < 0 for some £ € B. Let eg € R” denote the 
vector with éth component equal to 1 and all others equal to 0. For a > 0 the 
vector y(a) defined by 


y(a) =y¥ — aAg'er 
satisfies 
b'y(a) = b'y — az > b'y. 
Observe that the slack of y(q) is 
c(a) =E€+aATAZ'ey. 


Hence we can get a point with better (lower) objective value than the current 
basic feasible solution x. Unless the problem is unbounded, there is a length 
a* > 0 that makes one of the non-basic components j of c(a) drop to zero while 
keeping all of them non-negative. In this case a basis with a better dual solution 
can be obtained by replacing £ with j. Algorithm 2.2 gives a formal description 
of the dual simplex method. 

We illustrate the dual simplex in the following variation of Example 2.2. 
Suppose we add the constraint 


6x1 + 5xq < 500. 
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Algorithm 2.2 Dual simplex method 


1: start with a basis B C {1,...,n} such that the reduced cost € is non-negative 
2: while x 7 0 do 


3: 
4: 
5: 
6 


TE 
8: 


choose an index £ € B such that zẹ < 0 
compute v = ATAG e 


if v > 0 then HALT; the problem is unbounded end if 
let a* := min Ai a E 

4:04,<0 vil |v;| 
form a new basis by replacing £ with j 


update the dual feasible solution by replacing y with y(a*) 


9: end while 


After adding the relevant new slack variable the new linear program is 


min —427, — 3x9 
s.t. 
zı + T2 + T3 = 100 
221 +r & + 2X4 = 150 
321 + 4x2 + £5 = 360 
621 a 5x2 + Tte = 500 
L1, T2, L3, T4, L5, V6 = 0 


If we start with the initial basis B = {1,2,5,6} the algorithm proceeds as 
follows. 


Iteration 1: B = {1,2,5,6}, ey = [és ih = [2 Ne > 0, and 


3s = [|Zı Z2 Zs |’ =[50 50 10 —50]' xO. 


Choose ¢ = 6 as the index to leave the basis. Compute v = ATAG e = 
Ci 2 C ; 
[0 0 -4 -1 0 1)! anda*:= min “@ = £ =. Hence j =3 
ivi <0 |v, | 4 |v3| 
is the index entering the basis. Update the basis, reduced cost, and basic 
solution respectively to B = {1,2,5,3}, én = |e če] = [0.5 0.5)", 


T T 
and Xp = [%1 Z2 Z5 g| = [62.5 25 72.5 12.5] . The new 
basic solution is non-negative and hence it is an optimal solution. 


Interior-Point Methods 


In contrast to the simplex method, interior-point methods generate a sequence 
of iterates that satisfy x,s > 0. Each iteration of the algorithm aims to make 
progress towards satisfying Ax = b, Aly +s = c, and zis; = 0, i=1,...,n. 


Throughout this section we use the following notational convention: Given 


a vector x € R”, let X € R”*” denote the diagonal matrix defined by X; = 
zi, i=1,...,n, and let 1 € R” denote the vector whose components are all ones. 
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The optimality conditions (2.9) can be restated as 


Aly+s-—ce 0 
Ax—b = |0 
XS1 0 


Given u > 0, let (x(w), y(w),S(t)) be the solution to the following perturbed 
version of the above optimality conditions: 


Aly+s-—e 0 
Ax—b =/]0O], x,s>0. 
XS1 pl 


The first condition above can be written as r,,(x, y, s) = 0 for the residual vector 


Aly+s-—e 
r,(X,y,8) := Ax—b 
XS1— u1 


The central path is the set {(x(), y(u), s(u)) : u > 0}. It is intuitively clear that 
(x(u), y(u),s(u)) converges to an optimal solution to both (2.1) and (2.3) as u 
goes to 0. This suggests the following algorithmic strategy: Suppose (x, y,s) is 
“near” (x(u), y(u),s(u)) for some u > 0. Use (x, y,s) to move to a better point 
(xt, yt,s+) “near” (x(u*),y(ut),s(u*)) for some p* < p. 

It can be shown that if a point (x,y,s) is on the central path, then the 
corresponding value of u satisfies x's = np. Likewise, given x,s > 0, define 


x's 


u(x, S$) := —. 
n 
To move from a current point (x,y,s) to a new point, we use the so-called 
Newton step; that is, the solution to the following system of equations: 


0 A! I] [Ax c—Aly-—s 
A 0 Of] |Ay] = b— Ax f (2.10) 
S 0 X]| [As u1 -— XS1 


Algorithm 2.3 presents a template for an interior-point method. 


Algorithm 2.3 Interior-point method 

1: choose x°,s° > 0 

2: for k = 0,1,... do 

3: solve the Newton system (2.10) for (x,y,s) = (x*,y*,s*) and u := 
0.1u(x*,s*) 

A: choose a step length a € (0, 1] and set (x*t+1, y*+1, sk+1) = (x* y*,s*) 4 
a(Ax, Ay, As) 

5: end for 
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The step length a in step 4 should be chosen so that x*t+!,s*+! > 0 and 
the size of r,(x**1, y*+!, s**1) is sufficiently smaller than r,,(x*,y”,s*). A line- 
search procedure as the one described in Algorithm 2.4 is a popular strategy for 
choosing the step length a. 


Algorithm 2.4 Line search to select the step length a 


1: let Qmax := max{a : (x*,y*,s*) + a(Ax, Ay, As) > 0} 

2: start with a := 0.99amax 

3: while ||r,,((x*, y*,s*) + a(Ax, Ay, As))|| > (1 — 0.01a)||r,,(x*, y*, s*)|| do 
4: a:=a/2 

5: end while 


In contrast to the primal and dual simplex methods, which in principle gen- 
erate either primal or dual feasible iterates and terminate after finitely many 
iterations, interior-point methods typically generate infeasible iterates and con- 
verge to the optimal solution in the limit. In practice, the convergence is so fast 
that in a few iterations the algorithm yields iterates that are within machine 
precision of an exact optimal solution. The algorithm can also be enhanced to 
detect infeasibility. It relies on the fact that when the primal or dual problem is 
infeasible, the norm of the residual r,,(x,y,s) cannot be driven to zero. 

When applied to Example 2.2 starting from x° = s° = [100 --- 100] ie sys 
0, the above interior-point algorithm generates the following sequence of iterates. 
(For ease of notation we only display the first two entries of each iterate.) 


Iteration 0 1 2 tee 8 9 
£ı 100 19.7084 17.2976 --- 49.9383 49.9962 
£2 100 57.3930 41.6795 --- 50.0436 50.0011 


Notes 


The simplex method was developed by George Dantzig (1963). Clever implemen- 
tations of the simplex method, such as the revised simplex and simplex tableau, 
perform iterations far more efficiently than what a naive recalculation of the 
basic solution and reduced cost from scratch at each iteration would involve. 
Detailed discussions on these implementations and other related issues can be 
found in the books by Bertsimas and Tsitsiklis (1997) and Chvátal (1983). 

Interior-point methods for linear programming were introduced in a land- 
mark paper by Karmarkar (1984) and subsequently triggered a massive burst of 
research in optimization during the 1990s and early 2000s. The books by Renegar 
(2001), Ye (1997), and Roos et al. (2005) present the main developments on this 
topic. 
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State-of-the-art linear programming solvers such as CPLEX, MOSEK, Gurobi, 
and others use implementations of both the simplex and interior-point methods. 
These solvers can easily solve linear programs with millions of variables and 
constraints. 


Exercises 


Exercise 2.1 Draw the feasible region of the following two-variable linear 
program: 
max 271 — T2 
gi tarzi 
Lı — T2 < 0 
32, +20 <6 
£1, £2 > 0. 


Determine the optimal solution to this problem by inspection. 
Exercise 2.2 Consider the following two-variable linear program: 


min 221 + 3x2 
zı tt: >5 
xı > 1 
£2 > 2. 


3 
Prove that x* = | l is an optimal solution by showing that the objective value 


of any feasible solution is at least 12. Hint: Use an appropriate combination of 
the constraints. 


Exercise 2.3 Consider the linear programming problem 


max cx 


Ax<b 
x > 0, 


where 


ee ep oT _ [3 T 
reaal aj c'=(6 543 5 4]. 


Solve this problem using the following strategy: 


(a) Find the dual of the above primal linear program. The dual has only two 
variables. Solve the dual by inspection after drawing a graph of its feasible 
set. 

(b) Using the optimal solution to the dual problem and the optimality condi- 
tions, determine what primal constraints are binding and what primal vari- 
ables must be zero at an optimal solution. Using this information, determine 
the optimal solution to the primal linear program. 


32 


Linear Programming: Theory and Algorithms 


Exercise 2.4 


(a) Give an example of a two-variable linear program that is infeasible. 
(b) Give an example of a two-variable linear program that is unbounded. 


Exercise 2.5 Consider the linear programming problem 


max Cti +: + Cnn 
s.t. Qizi t-:-+ant%, = b 
Dicey 2.0, 
where b > 0 and cj,a; > 0, j =1,...,n. Characterize the optimal solution(s) to 


this problem. Could there be more than one? 
Exercise 2.6 


(a) Write the linear programming model in Example 2.1 (fund allocation) in 
standard form. More precisely, show that for suitable A, b, č, x the linear 
programming model in Example 2.1 can be rewritten as 


min ¢!x 
x 


Hint: Introduce additional variables. 
(b) Show that the standard dual of the model in part (a), namely 


max bly 


y ~ 
A'y < č, 


S¥r 


is equivalent to (2.8). 


Exercise 2.7 Consider the linear programming problem (2.5). In principle the 
dual of this problem can be obtained as follows: First, rewrite it in standard 
form (2.1) by using slack variables. Then obtain the “standard” dual, as in (2.2), 
for the problem rewritten in this standard form. Prove that the dual problem 
obtained in this fashion is equivalent to (2.6). 


Exercise 2.8 Consider the following investment problem over T years, where 
the objective is to maximize the value of the investments in year T. We assume 
a perfect capital market with the same annual lending and borrowing rate r > 0 
each year. We also assume that exogenous investment funds b+ are available in 
year t, fort = 1,...,7. Let n be the number of possible investments. We assume 
that each investment can be undertaken fractionally (between 0 and 1). Let atj 
denote the cash flow associated with investment j in year t. Let cj be the value of 
investment j in year T (including all cash flows subsequent to year T discounted 
at the interest rate r). 

The linear program that maximizes the value of the investments in year T is 
the following. Denote by x; the fraction of investment j undertaken, and let y: 
be the amount borrowed (if negative) or lent (if positive) in year t: 


2.9 Exercises 33 


n 
max So gzj + yr 
j=l 
n 
s.t. -X anja; + yı < bi 
j=1 
-X aaj — (1+ r)ye1 +y<b fort=2,...,T 
O<aj<l for j=1,...,n. 


(a) Write the dual of the above linear program. 

(b) Solve the dual linear program found in part (a). 

Hint: Note that some of the dual variables can be computed by backward 
substitution. 

(c) Write the complementary slackness conditions. 

(d) Deduce that the first T constraints in the primal linear program hold as 
equalities. 

(e) Use the complementary slackness conditions to show that the solution 
obtained by setting x; = 1 if cj + Sa +r)?ta,; > 0, and zj = 0 
otherwise, is an optimal solution. 

(£) Conclude that the above investment problem always has an optimal solution 
where each investment is either undertaken completely or not at all. 


Exercise 2.9 Consider the following variation of Exercise 2.5 where there are 
upper bounds u; on each of the variables: 


max Cız +++: +€n0n 
s.t. azı +++ + antn <b 
0< zi < ui for i = 1,... n. 
Assume that b > 0 and all a;, ci, u; are also strictly positive for i = 1,...,n. 
Furthermore, assume 
c c c 
PS e eo, 
ay a2 an 


Write the problem in standard form and apply the simplex method to it. What 
steps will the simplex method take? In other words, in what order will the 
variables enter and leave the basis? 


Exercise 2.10 Install and get acquainted with CVX. This package is freely 
available and extremely easy to install and use. It can be downloaded from 
http://cvxr.com/cvx/download/. 

Write a MATLAB script that takes as inputs an m x n matrix A, an m- 
dimensional vector b, and an n-dimensional vector c and solves the optimization 
problem 
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(a) Test your script on instances generated as follows: 
>> m=1, n=5, b=1, c=ones(n,1), A=rand(m,n); 
and 
>> m=1, n=5, b=1, c=rand(n,1), A=ones(m,n); 


Are the results consistent with your answer to Exercise 2.5? 
(b) Test your script on instances generated as follows: 
>> m=2, n=6, b=rand(m,1i), c=rand(n,1), A=rand(m,n); 
>> m=2, n=10, b=rand(m,1), c=rand(n,1), A=rand(m,n); 
>> m=4, n=10, b=rand(m,1), c=rand(n,1), A=rand(m,n) ; 
>> m=4, n=20, b=rand(m,1), c=rand(n,1), A=rand(m,n) ; 
Do you notice anything peculiar about the number of non-zero entries in the 
optimal solution x in each case? 


Exercise 2.11 Use Theorem 2.4 to prove Theorem 2.5. To that end, proceed 
as follows. 


(a) (Farkas’s lemma) Consider the linear programming problem 


min bly 
st. Aly <0. 
Show that the dual of this problem is 


max 0 
s.t. Ax=b 
x > 0. 


Now apply Theorem 2.4. 
(b) (Gordan’s theorem) Proceed as in (a) but this time start with the linear 
programming problem 


max t 
st. Aly—1t>0. 


(c) (Stiemke’s theorem) Proceed as in (a) and (b) but this time start with the 
linear programming problem 


max t 
s.t. Ax=0 
x—1t>0. 


Exercise 2.12 Use Theorem 2.5 to prove Theorem 2.4. 


Exercise 2.13 To break the circular argument in the above two exercises, 
prove Theorem 2.5 using the following hyperplane separation theorem: If S C R” 
is closed and convex and x ¢ S then there exists a hyperplane separating x and 
S. That is, there exists a € R” \ {0} and b € R such that a'x < b < a! y for all 
yes. 
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Linear Programming Models: 
Asset—Liability Management 


This chapter presents a classical application of linear programming to covering 
known liabilities by constructing a dedicated fixed-income portfolio. When the 
liabilities span multiple years, the model assumes that the only sources of risk 
are changes in the term structure of interest rates. We also discuss a short-term 
financing problem. 


Dedication 


Consider the problem of funding a stream of liabilities that extends over the 
future. Assume the forecast of liabilities is accurate. This problem arises in 
certain practical situations such as the liabilities of a pension fund. It also 
arises in non-financial institutions planning acquisitions, expansion, or product 
development. A dedicated bond portfolio is a portfolio of bonds constructed today 
and whose cash flows offset the liabilities. 


Example 3.1 (Bond dedication) Suppose a pension fund needs to cover some 
liabilities in the next six years. Cash requirements (in million $) are: 


Year 1 2 3 4 5 6 


Required 100 200 800 100 800 1200 


Suppose the pension fund can invest in ten government bonds with the cash 
flows and current prices in Table 3.1. 

Find the least expensive portfolio of bonds whose cash flows will be sufficient 
to cover the cash requirements. Assume surplus cash can be carried from one 
year to the next but earn no interest. 


We can formulate this problem as the following linear programming model. 


Linear programming model for bond dedication 

Variables: 
xj: amount of bonds j in the portfolio, for j = 1,..., 10; 
s+: Surplus cash in year t, fort =1,...,6. 
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Table 3.1 
Year 

1 2 3 4 5 6 Price 
Bond 1 10 10 10 10 10 110 109 
Bond 2 7 7 T T T 107 94.8 
Bond 3 8 8 8 8 8 108 99.5 
Bond 4 6 6 6 6 106 93.1 
Bond 5 T T T T 107 97.2 
Bond 6 5 5 5 105 92.9 
Bond 7 10 10 110 110 
Bond 8 8 8 108 104 
Bond 9 7 107 102 
Bond 10 100 95.2 


Objective: 
min 1092; + 94.822 +--+ + 102z9 + 95.2x10. 
Constraints: 
10%, + 7#g +--- + Trg + 100249 = 100+ 5, 
10x1 + 72 +---+1072%9 +s; = 200+ 52 
110x; + 107x2 + 10823 +s5 = 1200+ s6 


x; >0, j=1,...,10 
st > 0, t= ee OF 


Notice that we can write the equality constraints also as 


102, + 7g +--+ + 7x9 + 1002149 —sı = 100 
102, + 7x2 +--- + 10729 +sı —s2 = 200 
1102, + 107x2 + 10823 +s5 — se = 1200 


x; >0, j=1,...,10 
5, >0,t=1,...,6, 


or as 
102, bt t TTo ia iaiia a 7X9 + 1002x419 —100 = S1 
10z1 + 7g +--+ + 10729 +81 — 200 = sz 
1102, + 10722 + 10823 +s, — 1200 = 56 


x; >0, j=1,...,10 
st > 0, el ears OF 


In general, for a given problem with liabilities projected over m points in time 
over the future, the stream of liabilities is a vector: 
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Date 1 2 see m 


Required Lı L2 -:: Lm 


Suppose we can use n bonds with the following cash flows and prices: 


Date 1 2 see m Prices 
Bond1 Fi, Fo © Fm pı 
Bondj Fy; Fay =  Fmj Pj 
Bondn Fin Fan © Fma Pn 


The linear programming formulation of the cash matching problem is as follows. 


Linear programming model for bond dedication (general version) 
Variables: 

zj: amount of bonds j in the portfolio, for j = 1,...,n; 

s+: Surplus cash in year t, for t = 1,...,m. 


Linear programming model: 


j=1 
n 
s.t y Fy jx; — s1 = Li 
j=1 
n 
y Forty 84-1 — St — Li, t= 2,...,m 
j=1 


The problem can be written more concisely as follows: 


min p!x 
s.t. Fx+Rs=L 
x>0 
s>0, 
where 
—1 0 0 0 0 
1 -1 0 0 0 
Pu Fin 0 1 -1 0 0 
F= : , R= ` , 
Fmi Emn 0 0 1 -1 0 


j=) 
So 
j=) 
= 
| 
= 
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pı Ly 


Sensitivity Analysis 


As noted in Section 2.4, when a linear programming model is solved, the dual 
solution yields a great deal of sensitivity information, or information about what 
happens when data values are changed. Recall the sensitivity interpretation 
associated with the shadow price. Assume A is the shadow price of a constraint: 


If the right-hand side of a constraint changes by A, then the optimal 
objective value changes by à- A, as long as the change of the right-hand 
side is within the allowable increase or decrease. 


This concept is particularly insightful in the bond dedication problem. In a 
nutshell, the sensitivity information of the linear optimization model leads to an 
implied term structure as we next explain. Recall our linear programming model 
for portfolio dedication: 


j=1 
n 
s.t y Fy jx; — sy, = Ty 
j=l 
n 
y a 84-1 — St = Li, t= 2,...,m 
j=1 


The shadow price of constraint at time t is the extra amount of money needed 
today to cover an extra unit of liability at time t. In other words, the shadow price 
At gives the discount factor for time t. The current portfolio therefore implies 
the following term structure of interest rates: 


1 1 
Tt = AJ $ 
Immunization 
Consider again the problem of covering a stream of liabilities (Z1,..., Lm) due at 


m different dates in the future. In principle, the stream of liabilities is equivalent 
to a lump sum of cash today equal to its present value, obtained by discounting 
the future liabilities. Setting aside an amount equal to this present value seems 
simpler than constructing a dedicated portfolio. A problem with this approach 
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is that it is fully exposed to interest-rate risk. By contrast, a dedicated portfolio 
is not subject to interest-rate risk since it matches the liabilities at the time they 
occur. Immunization is an approach that reduces interest-rate risk as compared 
to the simple-minded present value approach, but it does not completely protect 
against it as dedication would. The advantage is that immunized portfolios 
are typically cheaper than dedicated portfolios. The idea is simple: construct 
a portfolio with the same present value as the stream of liabilities, and further 
require that this present value has the same sensitivity to changes in interest 
rates as the stream of liabilities. 

More precisely, suppose 11,...,%m is the term structure of risk-free interest 
rates. This means that the value r is the yield on a risk-free zero-coupon bond 
with maturity t. In other words, r; is the interest rate that applies to money 
invested between now and time t. By discounting each of the cash flows with the 
appropriate discount rate, it follows that the present value (PV) of a stream of 
cash flows (Fi,..., Fm), where F; occurs at time t, is 


F F. Fin 
py = —+- 4 —# _}...4 ; 
(1+rm)™ 


If interest rates shift by 6, we get 
Fi Fo Fm 


P = renees = =< 2, 
uo) l+rı +ô Gres "kt oye 
Notice that 
Fy 2F i mMmEFm 
PV(d) —PV & (oS (Pee: ot): 


This motivates the following concept. 


Definition 3.2 The Fisher—Weil dollar duration (DD) of the stream of cash 
flows (F\,..., Fm) is 


m tF, 
DD := m. 
2 (L+ rett 


An immunized portfolio is a portfolio of bonds whose present value and dura- 
tion match those of the stream of liabilities. In optimization terms, this corre- 
sponds to a portfolio that satisfies the following constraints: 


A closer look at the difference between PV and PV(ô) suggests that we can 
get an even better matching of sensitivity to changes in the term structure by 
looking at second-order terms: 
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This leads to the so-called Fisher—Weil dollar convexity (DC) of (Fi,..., Fm): 
C= a (t+ 1)F 
t=1 (1 + rt) jaa , 

as well as the Fisher—Weil convexity (C): 


1 & t(t+1)F 
C= Lae t+2° 


t=1 


A portfolio can therefore be further immunized by matching present value, dollar 
duration, and dollar convexity: 


SPV a; = PVL 
j=l 


S DD;z; = DD; (3.1) 
j=l 


5 DC;2; > DCL. 


j=1 


Here the subindices 7 = 1,...,n and L refer to the bonds and liabilities respec- 
tively. Note that since having net positive convexity is favorable, the last con- 
straint is an inequality constraint. 

The immunization constraints (3.1) are generally less stringent than the bond 
dedication constraints, namely 


XO Fija — 8, = Li 
XO Fyri tsei- s = Ly, t=2,...,m (3.2) 
j=l 


zj > 0, j=1,...,7 
s> 0, t=1,...,m 


Indeed, if the surplus variables s,, t = 1,...,m, are all zero in (3.2), then 
some straightforward algebra shows that any 21,...,2, satisfying (3.2) also 
satisfies (3.1). 

The previous discussion assumes that interest is compounded at discrete time 
intervals, e.g., annually or semiannually. In some practical circumstances cash 
flows may occur at irregular times. In those cases it could be more convenient to 
assume that interest is continuously compounded. Suppose r+ is the continuously 


3.4 
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compounded spot rate for a risk-free zero-coupon bond with maturity t. Then the 


present value of a stream of cash flows (Fi,..., Fim) is 
m 
PY Se 
t=1 


Consequently its dollar duration is 


DD = > tFye te, 


t=1 


and its dollar convexity is 


DC = 2 Bet, 


t=1 


A nice feature of continuous compounding is that the formulas for an irregular 
stream of cash flows (Fy,,..., Fẹ) are very similar: 


m 
PV = 5 F; eet: 
i=1 
m 
DD = x tiPpeT t, 
i=1 


m 
DCS) Ghee. 


i=l 


The kind of immunization via duration and convexity enforced by the con- 
straints (3.1) provides hedging against parallel shocks in the term structure. 
This implicitly assumes a one-factor interest risk model. There are enhancements 
based on a multi-factor interest risk model. Two popular ones are the key-rate 
model and the shift-twist—butterfly model as discussed in Tuckman (2002). The 
logic of immunization naturally extends to a multi-factor interest risk model. In 
such a context an immunized portfolio should be hedged against changes in each 
of the risk factors. 


Some Practical Details about Bonds 


There are certain details about the way bonds are quoted and traded in actual 
exchanges. The discussion below applies only to plain vanilla treasury bonds. For 
a more detailed discussion, see Fabozzi (2004) or Tuckman (2002). 
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Principal Value, Coupon Payments, Clean and Dirty Prices 


The principal value, or par, or principal of a bond is the amount that the issuer 
agrees to repay the bondholder. The term to maturity of a bond is the time 
remaining until principal payment. The maturity date of a bond is that date 
when the issuer will pay the principal. 

The coupon rate or nominal rate of a bond is the annual interest that the 
issuer pays the bondholder. Treasury bonds pay their coupons semiannually. For 
example, a bond with an 8% coupon rate and a principal of $1,000 will pay a 
$40 installment to the holder every six months. At the maturity date, it will pay 
the $40 installment plus the $1,000 principal. 

When an investor purchases a bond between coupon payments, the investor 
must compensate the seller of the bond for the coupon interest earned since the 
last coupon payment. This is called the accrued interest and is computed based 
on the proportion of time since the last coupon payment. The convention for 
United States treasuries is not to include the accrued interest in the price quote. 
This price is called the clean price or simply the price. It is customary to present 
the price quote as a percentage of the par value of the bond. The clean price 
plus the accrued interest is called the dirty price or full price. 

For example, suppose that on February 15, 2051 investor B buys a treasury 
bond with $10,000 face value, 5.5% coupon rate that matures on January 31, 
2053. In this case the coupon payment is $275 = 2.75% of $10,000 and the 


accrued interest is 


15 
— - 275 = 22.79. 
181 


Suppose the price quote on February 15, 2051 is 101.145. Then the full price of 
the bond, that is, the price paid by the buyer to the seller, is $10,114.5+$22.79 
= $10,137.29. 


Yield Curve and Term Structure 


Recall that the yield of a bond is the interest rate that makes the discounted 
value of the cash flows match the current price of the bond. By convention, 
the yield is quoted on an annual basis. The treasury yield curve is the curve of 
yields for on-the-run (most recently auctioned) treasuries. It should be noted 
that the yield curve is not the same as the term structure of interest rates. This 
is the case because treasuries with maturity greater than one year are not zero- 
coupon bonds. Indeed, the term structure of interest rates is actually a theoretical 
construct that must be estimated from actual bonds. 

There are various ways of estimating the term structure. A quick and dirty 
(perhaps too dirty) approach is to ignore the difference described above and to 
use the yield curve as a proxy for the term structure of interest rates. 

A second approach is to use the following bootstrapping approach: Use several 
coupon-bearing bonds with various maturities. Determine the spot rate implied 
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by the bond with the shortest maturity. Use that knowledge to compute the spot 
rate implied by the bond with the next shortest maturity and so on. For example, 
suppose we have a 0.5-year 5.25% bill, a 1-year 5.75% note, and a 1.5-year 6% 
note. For simplicity assume they all are trading at par. Let z1, 2, z3 denote the 
one-half annualized 0.5-year, 1-year, and 1.5-year spot rates. 

Using the 0.5-year bond, we readily get 


zı = 0.0525 - 0.5 = 0.02625. 
Using $100 as par, the cash flows for the 1-year bond are 


0.5-year: 0.0575 - 100 - 0.5 = 2.875 
1-year: 0.0575 - 100 - 0.5 + 100 = 102.875. 


Now compute its present value using the spot rates z1, 22 and equate that to its 
current price: 
2.875 102.875 


100 = + : 
1 + 21 (1 + 22) 


Because we already know z1, we can solve for z2 and obtain 
z2 = 0.028786. 


Repeat with the 1.5-year bond: Using $100 as par, the cash flows for the 1.5-year 
bond are 


0.5-year: 0.06 - 100 -0.5 = 3 

1-year: 0.06 - 100 - 0.5 = 3 

1.5-year: 0.06 - 100 - 0.5 + 100 = 103. 
Now compute its present value using the spot rates z1, 22, 23 and equate that to 
its current price: 

3 3 103 
100 = . 
(1 + 2z3)8 


1+ 2 (1 + 22)? 
Because we already know 2}, 22, we can solve for z3 and obtain 


z3 = 0.030063097. 
Thus the annualized spot rates are 
ro. = 0.0525, rı = 0.057572, rg = 0.06012. 


Yet a third, and much more elaborate, approach to estimating the term struc- 
ture is to take into consideration all bonds with similar characteristics available 
in the market and perform an elaborate regression model. This approach requires 
advanced statistical techniques and is beyond the scope of this book. For a related 
discussion see Campbell et al. (1997) and Heath et al. (1992) 

It should also be noted that the previous two estimation approaches only give 
spot rates at specific points in time. The spot rates at other times can be obtained 
by interpolation. The simplest type of interpolation is piecewise linear. 
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Other Cash Flow Problems 


The dedication model discussed in Section 3.1 belongs to the broader class of 
cash flow problems. A firm faces a stream of both positive (inflows) and negative 
(outflows) flows of cash. The negative flows are considered liabilities that must 
be met when they occur. To meet the liabilities, the firm can purchase a variety 
of instruments each with a different cash flow pattern. 

The following short-term financing problem is of this kind. Corporations 
routinely face the problem of financing short-term cash commitments. Linear 
programming can help in figuring out an optimal combination of financial 
instruments to meet these commitments. For illustration, consider the following 
problem. For the sake of exposition, we keep the example small. 


Example 3.3 (Short-term financing) A company has the following short-term 
financing problem (net cash flow requirements are given in $1000s). 


Month J F M A M J 


Net cash flow —150 -—100 200 —200 50 300 


The company has the following sources of funds: 


e A line of credit of up to $100,000 at an interest rate of 1% per month. 


e It can issue 90-day commercial paper bearing a total interest of 2% for the 
3-month period. 


e Each month excess funds can be invested at an interest rate of 0.3% per 
month. 


There are many questions that the company might want to answer. Is it eco- 
nomical to use the line of credit in some of the months? If so, when? How much? 
What interest payments will the company need to make between January and 
June? Linear programming gives us a mechanism for answering these questions 
quickly and easily. 


Linear programming model for short-term financing problem 


Variables: 


zj: amount drawn from the line of credit in month j, for j = 1,...,5 
yj: amount of commercial paper issued in month j, for j =1,...,3 
zj: excess funds in month j, for j = 1,...,6. 

Objective: 


max ze. 
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Constraints: Cash balance constraints in each month and bounds on gzj, yj 


and zj: 


Ti-A 
T2 7 
t3 71 


t4 — 


t5 — 


Solving this linear program using either Excel 


1.0121 
1.0122 
1.0123 
1.01a4 
1.0125 


1.00321 
1.00322 
1.00323 
1.00324 
1.00325 


obtain the following optimal solution: 


151.944 


21 
22 
23 
Z4 
25 
6 
Tj 
Tj 
Yj 
Zj 


II 


IV IV IV IA Il 


l 
w 
S 
5 


for j =1,... 
for j = 1,... 
for j = 1,... 
for j =1,... 


or w or or 


Solver or MATLAB CVX, we 


* 


0 
351.944 
0 
0 
92.497 


Thus the company can attain an optimal wealth of $92,497 in June. To achieve 
this, the company will issue $150,000 in commercial paper in January, $100,000 
in February and $151,944 in March. In addition, it will draw $52,000 from its 
line of credit in May. Excess cash of $351,944 in March will be invested for one 


month. 


Figure 3.1 displays the Excel Solver sensitivity report for this model. 
The key columns for sensitivity analysis are the “Reduced Cost” and “Shadow 
Price” columns. Recall that the shadow price u of a constraint C has the following 


interpretation: 


range. 


If the right-hand side of the constraint C changes by an amount A, the 
optimal objective value changes by u- A as long as A is within a certain 


The above sensitivity information allows us to perform various kinds of “what 


if” analysis. 


e For example, assume that net cash flow in January were —200 (instead of 
—150). By how much would the wealth of the company decrease at the 
end of June? The answer is in the shadow price of the January constraint, 
u = —1.0373. The right-hand side of the January constraint would go 
from 150 to 200, an increase of A = 50, which is within the allowable 
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Variable Cells 


Final Reduced Objective Allowable Allowable 

Cell Name Value Cost Coefficient Increase Decrease 
SIS8 credit J Amount O -0.003214 O 0.00321386 1E+30 
$I$9 credit F Amount O0 -3.89E-16 O 3.8858E-16 1E+30 
$I$10 credit M Amount 0 -0.007119 0 0.00711864 1E+30 
$I$11 credit A Amount QO -0.003151 0 0.00315085 1E+30 
$I$12 credit M Amount 52 0 O 0.00311965 3.8096E-16 
$IS13 commercial J Amount 150 0 0 0.00399754 0.00321386 
$IS14 commercial F Amount 100 0 O 0.00318204 3.8858E-16 
$I$15 commercial M Amount 151.9441675 0 0 3.8473E-16 0.0031603 
$B$18 Surplus January 0 -0.003998 0 0.00399754 1E+30 
$C$18 Surplus February O -0.00714 0 0.00714 1E+30 
$D$18 Surplus March 351.9441675 0 0 0.00393091 0.0031603 
SES18 Surplus April 0 -0.003919 0 0.00391915 1E+30 
SFS18 Surplus May (0) -0.007 (0) 0.007 1E+30 
$G$18 Surplus June 92.49694915 (0 1 1E+30 1 

Constraints 

Final Shadow Constraint Allowable Allowable 

Cell Name Value Price R.H. Side Increase Decrease 
$B$19 Total January 150 -1.037288 150 89.1718954 149.411765 
$C$19 Total February 100 -1.0302 100 47.0588235 50.9803922 
$D$19 Total March -200 -1.02 -200 90.6832835 151.944167 
SES19 Total April 200 -1.016949 200 90.9553333 152.4 
SFS19 Total May -50 -1.01 -50 48 52 
$G$19 Total June -300 -1 -300 92.4969492 1E+30 


Figure 3.1 Sensitivity report for short-term financing model 


increase (89.17). So the wealth of the company in June would decrease by 
1.0373-50,000 = $51,865. 


e Now assume that net cash flow in March were 250 (instead of 200). By how 


much would the wealth of the company increase at the end of June? Again, 
the change A = —50 is within the allowable decrease (151.944), so we can 
use the shadow price u = —1.02 to calculate the change in objective value. 
The increase is (—1.02) - (—50) = $51,000. 


e Assume that the negative net cash flow in January is due in part to the 


purchase of a machine worth $100,000. The vendor allows the payment to 
be made in June at an interest rate of 3% for the 5-month period. Would 
the wealth of the company increase or decrease by using this option? What 
if the interest rate for the 5-month period were 4%? The shadow price of the 
January constraint is —1.0373. This means that reducing cash requirements 
in January by $1 increases the wealth in June by $1.0373. In other words, 
the break-even interest rate for the 5-month period is 3.73%. So, if the 
vendor charges 3%, we should accept, but if he charges 4% we should not. 
Note that the analysis is valid since the amount A = —100 is within the 
allowable decrease. 
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Next, let us consider the reduced costs. Recall that these are the shadow prices 
of the upper and lower bounds placed directly on the variables. The reduced cost 
of a variable is non-zero only when the variable is equal to one of its bounds. 
Assume x is equal to its lower bound b and its reduced cost is c. There are two 
useful interpretations of the reduced cost c. 


e First, if the value of x is set to a value b+ A for A > 0 instead of its optimal 
value b then the objective value is changed by c. A. For example, what 
would be the effect of financing part of the January cash needs through the 
line of credit? The answer is in the reduced cost of the first variable. Because 
this reduced cost —0.0032 is strictly negative, the objective function would 
decrease. Specifically, each dollar financed through the line of credit in 
January would result in a decrease of $3.2 in the wealth of the company in 
June. 

e The second interpretation of c is that its magnitude |c] is the minimum amount 
by which the objective coefficient of x must be changed in order for the 
variable x to move away from its bound in an optimal solution. For example, 
consider the first variable again. Its value is zero in the current optimal 
solution, with objective function zg. However, if we changed the objective 
to z¢+0.003221, it would now be optimal to use the line of credit in January. 
In other words, the reduced cost on xı can be viewed as the minimum rebate 
that the bank would have to offer (payable in June) to make it attractive 
to use the line of credit in January. 


3.6 Exercises 


Exercise 3.1 You need to create a portfolio to cover the following stream of 
liabilities for the next six future dates: 


Date 1 2 3 4 5 6 


Required 500 200 800 200 800 1200 


You may purchase the bonds in Table 3.2. 
The term structure of risk-free interest rates is: 


Date 1 2 3 4 5 6 
Rate 5.04% 5.94% 6.36% 7.18% 7.89% 8.39% 


(a) Formulate a linear programming model to find the lowest-cost long-only 
dedicated portfolio that covers the stream of liabilities with the bonds above. 
Assume surplus balances can be carried from one date to the next but earn 
no interest. What is the cost of your portfolio? What is the composition of 
your portfolio? 


48 Linear Programming Models: Asset—Liability Management 


Table 3.2 
Year 
Bond 1 2 3 4 5 6 Price 
1 10 10 10 10 10 110 109 
2 T T T T 7 107 94.8 
3 8 8 8 8 8 108 99.5 
4 6 6 6 6 106 93.1 
5 T T T T 107 97.2 
6 6 6 6 106 96.3 
TA 5 5 5 105 92.9 
8 10 10 110 110 
9 8 8 108 104 
10 6 6 106 101 
11 10 110 107 
12 T 107 102 
13 100 95.2 


(b) Formulate a linear programming model to find the lowest-cost portfolio that 
matches the present value and dollar duration of the stream of liabilities. 
What is the cost of your portfolio? How do the two present values change 
if interest rates decrease by one percentage point? How do they change if 
interest rates increase by one percentage point? How do they change if the 
interest rates in dates 1 and 2 decrease by one percentage point, the rates in 
dates 3, 4, and 5 remain the same, and the rate in date 6 increases by one 
percentage point? 

(c) Use the linear programming sensitivity information from part (a) to deter- 
mine the implied term structure of interest rates. 


(d) Suppose that the stream of liabilities changes to: 


Date 1 2 3 4 5 6 


Required 100 200 800 500 800 1200 


Find the new optimal dedicated portfolio and determine the new implied 
term structure. Is it different from the one you obtained in part (c)? Can 
you provide an intuitive explanation for the difference or lack thereof? 


(e) Assume the liabilities occur at irregular time intervals: 


Date 1.25 25 35 45 575 65 


Required 500 200 800 200 800 1200 


(i) Repeat part (b) for this irregular stream of liabilities. 
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You will need to do some kind of interpolation to estimate the term 
structure at the relevant times. You will also need to make an assumption 
about how to discount at irregular time intervals. 


(ii) Repeat part (a) for this irregular stream of liabilities. 


(f£) Formulate a linear programming model to find the lowest-cost long-only 
dedicated portfolio that covers the stream of liabilities: 


Date 1 2 3 4 5 6 


Required 500 


with the following new set of bonds: 


Bond 1 2 3 4 5 6 Price Rating 
1 10 10 10 10 10 110 108 B 
2 7 7 7 7 7 107 94 B 
3 8 8 8 8 8 108 99 B 
4 6 6 6 6 106 92.7 B 
5 7 7 7 7 107 96.6 B 
6 6 6 6 106 95.9 B 
7 5 5 5 105 92.9 A 
8 10 10 110 110 A 
9 8 8 108 104 A 
10 6 6 106 101 A 
11 10 110 107 A 
12 T 107 102 A 
13 100 95.2 A 


This time assume that at most 50% of your portfolio’s value can be in 
bonds rated B. Again, assume surplus balances can be carried from one date 
to the next but earn no interest. What is the cost of your portfolio? What 
is the composition of your portfolio? 


Exercise 3.2 Suppose today is November 30, 2052. A pension fund will need to 
cover the following stream of liabilities over the subsequent four years (in million 


dollars): 
5/31/53] 11/30/53 |5/31/54| 11/30/54] 5/31/55] 11/30/55 | 5/31/56] 11/31/56 
12 10 10 10 9 9 9 15 


To cover these liabilities, the pension fund intends to use a portfolio comprised 
of the following 14 US treasury notes: 
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Description Coupon Maturity date Clean price 
US TREAS NTS 3.500% 05/31/2053 3.5 5/31/53 101.563 
US TREAS NTS 0.500% 05/31/2053 0.5 5/31/53 100.188 
US TREAS NTS 2.000% 11/30/2053 2 11/30/53 101.746 
US TREAS NTS 0.250% 11/30/2053 0.25 11/30/53 100.078 
US TREAS NTS 2.250% 05/31/2054 2.25 5/31/54 102.941 
US TREAS NTS 0.250% 05/31/2054 0.25 5/31/54 100.023 
US TREAS NTS 2.125% 11/30/2054 2.125 11/30/54 103.656 
US TREAS NTS 0.250% 11/30/2054 0.25 11/30/54 100.016 
US TREAS NTS 2.125% 05/31/2055 2.125 5/31/55 104.461 
US TREAS NTS 1.375% 11/30/2055 1.375 11/30/55 103.031 
US TREAS NTS 3.250% 05/31/2056 3.25 5/31/56 109.738 
US TREAS NTS 1.750% 05/31/2056 1.75 5/31/56 104.570 
US TREAS NTS 2.750% 11/30/2056 2.75 11/30/56 108.879 
US TREAS NTS 0.875% 11/30/2056 0.875 11/30/56 101.516 


(a) Compute the dirty (full) price of each of the above 14 bonds. For consistency, 


assume today is November 30, 2052. 


(b) Formulate a linear programming model to find the lowest-cost dedicated 
portfolio that covers the stream of liabilities. To eliminate the possibility of 
any interest risk, assume a 0% reinvestment rate on cash balances carried 
from one date to the next. Assume no short sales are allowed. What is the 


cost of your portfolio? What is the composition of your portfolio? 


(c) Use the linear programming sensitivity information from part (b) to deter- 


mine the term structure of interest rates implied by the portfolio. 


Exercise 3.3 


Exercise 3.4 A company will face the following cash requirements in the 
next eight quarters (positive entries represent cash needs while negative entries 


Prove that, if sı 82 


represent cash surpluses): 


Sm = 0 in (3.2), each of the 
immunization constraints in (3.1) is implied by the dedication constraints (3.2). 


Q1 | Q2 | Q3 | Q4 


Q5 | Q6 | Q7 | Q8 


100 | 500 | 100 | —600 


—500 | 200 | 600 | —900 


The company has three borrowing possibilities. 


e A 2-year loan available at the beginning of Q1, with a 1% interest per quarter. 


e The other two borrowing opportunities are available at the beginning of every 
quarter: a 6-month loan with a 1.8% interest per quarter, and a quarterly 


loan with a 2.5% interest for the quarter. 


e Any surplus can be invested at a 0.5% interest per quarter. 
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Formulate a linear program that maximizes the wealth of the company at the 
beginning of Q9. 


Exercise 3.5 Generate the sensitivity report for Exercise 3.4 with your favorite 
LP solver. 


(a) Suppose the cash requirement in Q2 is 300 (instead of 500). How would this 
affect the wealth in Q9? 

(b) Suppose the cash requirement in Q2 is 100 (instead of 500). Can the sensi- 
tivity report be used to determine the wealth in Q9? 

(c) One of the company’s suppliers may allow deferred payments of $50 from 
Q3 to Q4. What would be the value of this? 


Exercise 3.6 A home buyer can combine several mortgage loans to finance 
the purchase of a house. Regulations impose limits on the amount that can be 
borrowed from certain sources as well as a limit on the total reimbursement each 
month. Let B be the borrowing needs and T the number of months over which the 
loans will be paid back. There are n different loan opportunities available. Loan 
i has a fixed interest rate r;, a length T; < T and a maximum amount borrowed 
bi. The monthly payment on loan 7 is not required to be the same every month, 
but a minimum payment m; is required each month. Furthermore, we would like 
the total monthly payment p over all loans to be constant. Formulate a linear 
program that finds a combination of loans that minimizes the home buyer’s cost 
of borrowing. 

Hint: In addition to variables x;; for the payment on loan 7 in month t, it may 
be useful to introduce a variable for the amount of outstanding principal on loan 
i in month t. 


3.7 Case Study 
Let y denote the current year. A municipality sends you the following liability 
stream (in million dollars): 
12/15/y 


11 


6/15/y+1 | 12/15/y+1 | 6/15/y+2 | 12/15/y+2 | 6/15/y+3 | 12/15/y+3 | 6/15/y+4 
9 8 7 9 10 9 12 


12/15/y+4 | 6/15/y+5 | 12/15/y+5 | 6/15/y+6 | 12/15/y-+6 | 6/15/y+7 | 12/15/y+7 | 6/15/y+8 


9 


6 5 7 9 7 8 7 


Questions 
1. Determine the current term structure of treasury rates, and find the present 
value, dollar duration, and dollar convexity of the stream of liabilities. Please 
explain the main steps (interest rates, discount factors, compounding, etc.) 
followed in your calculations. You can find current data on numerous websites 
such as 
http://finance.yahoo.com/bonds 
http: //fixedincome.fidelity.com/fi/FILanding 
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. Identify at least 30 fixed-income assets that are suitable for a dedicated 


portfolio. Use assets that are considered risk-free, e.g., US government non- 
callable treasury bonds, treasury bills, or treasury notes. Display a succinct 
summary of the main characteristics of the bonds you chose (prices, coupon 
rates, maturity dates). 


. Formulate a linear programming model to find the lowest-cost dedicated 


portfolio that covers the stream of liabilities. To eliminate the possibility 
of any interest risk, assume a 0% reinvestment rate on cash balances carried 
from one date to the next. Assume no short sales are allowed. What is the 
cost of your portfolio? What is the composition of your portfolio? 


. Use the linear programming sensitivity information to determine the term 


structure of interest rates implied by the portfolio. Use a plot to compare it 
with the current term structure of treasury rates. 


. Formulate a linear programming model to find the lowest-cost portfolio that 


matches the present value, dollar duration, and dollar convexity of the stream 
of liabilities. Assume no short sales are allowed. What is the cost of your 
portfolio? How much would you save by using this immunization strategy 
instead of dedication? Is your portfolio immunized against non-parallel shifts 
in the term structure? Explain why or why not. 


. Combine a cash matching strategy for the liabilities during the first three 


years and an immunization strategy based on present value, duration and 
convexity for the liabilities during the last five years. Compare the cost of 
this portfolio with the cost of the two previous portfolios. 


. The municipality would like you to make a second bid: What is your best 


dedicated portfolio of risk-free bonds you can create if short sales are allowed? 
Did you find arbitrage opportunities? Did you take into consideration the 
bid—ask spread? Did you set limits on the transaction amounts? Discuss the 
practical feasibility of your solution. 


4.1 


Linear Programming Models: 
Arbitrage and Asset Pricing 


In this chapter, we prove the fundamental theorem of asset pricing and we give 
several applications, from arbitrage detection in the foreign exchange market, to 
pricing of options, and clientele effects in bond portfolio management. 


Arbitrage Detection in the Foreign Exchange Market 


The foreign exchange market includes the trading of currencies. It is one of the 
markets with largest trading volume. Given two currencies at any particular time, 
say the US dollar and the euro, there are two exchange rates between them: one 
dollar will buy rı euros, and one euro will buy r2 dollars. It is evident that an 
arbitrage opportunity would arise if rır2 > 1 since one could simultaneously 
convert 1 dollar into rı euros and the rı euros into rır2 > 1 dollars. These two 
transactions would net rır2 — 1 dollars without any risk. 

An interesting related question is: Can one detect a similar type of arbi- 
trage opportunity involving more than two currencies? In particular, consider 
the following hypothetical exchange rates among the currencies USD (US Dol- 
lars), EUR (Euros), GBP (British Pounds), AUD (Australian Dollars), and JPY 
(Japanese Yen). 


USD EUR GBP AUD JPY 


USD 1 0.639 0.537 1.0835 98.89 
EUR 1.564 1 0.843 1.6958 154.773 
GBP 1.856 1.186 1 2.014 184.122 
AUD 0.9223 0.589 0.496 1 91.263 
JPY 0.01011 0.00645 0.00543 0.01095 1 


A simple verification shows that there are no arbitrage opportunities involving 
only two currencies. However, could there be one involving more than two cur- 
rencies? Could you simply eyeball such an opportunity? If you cannot, can you 
prove that such an opportunity does not exist? 

We next show how to answer these questions using linear programming. For 
convenience, use 7 = 1,...,5 to index the above five currencies USD, EUR, 
GBP, AUD, and JPY in that order. We let a;; denote the exchange rate from 


54 


Linear Programming Models: Arbitrage and Asset Pricing 


currency 7 to currency j. For instance a34 = 2.014 and ag5 = 154.773. To model 
a set of transactions with potential for arbitrage, consider the following decision 
variables: 


e xij: amount of currency 7 converted to currency j. 


e yp: net amount of currency k after all transactions. 


These variables are related via the following constraints: 


An arbitrage would exist if there is a set of transactions so that after all transac- 
tions the net amount for each currency is non-negative and at least one of them 
is strictly positive. To find such a set of transactions we could solve the following 
linear programming problem: 


max yı 
5 5 
s.t. Yk = diktik Y Trj, k=1,...,5 
i=1 j=1 
Yk Z0 


However, if there is indeed an arbitrage opportunity, then the above problem 
would be unbounded. We can easily amend the above model so that the arbitrage 
can be revealed by introducing a bound on the objective function: 


max yı 
5 5 
s.t. Yk =X antik- Y trj, k=1,...,5 
i=1 j=1 
yı <1 
Tij > 0 
Yk Z0 


Solving this linear programming model, we find that indeed there are arbitrage 
opportunities. However, to obtain $1 in arbitrage, we have to exchange about 
1669.172 US dollars into 1066.601 euros, then convert these euros into 899.1446 
pounds, then convert these pounds into 1810.877 Australian dollars, and finally 
change these Australian dollars into 1670.172 US dollars. The arbitrage oppor- 
tunity is so tight that, depending on the numerical precision used, a linear 
programming solver may not find it. Furthermore, even if a solver does find 
it, the tightness of the arbitrage may render it impractical when accounting for 
market frictions such as transaction costs. 
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The Fundamental Theorem of Asset Pricing 


One of the most widely studied problems in financial mathematics is the pricing 
of contingent claims. These are securities whose price depends on the value of 
another underlying security. Under the assumption of no arbitrage, the price of 
such a contingent claim should match the price of a portfolio that replicates the 
payoff of the contingent claim. This basic principle underlies the powerful option 
pricing machinery dating back to the pioneering work of Merton (1973) and Black 
and Scholes (1973). The absence of arbitrage and the replication argument can 
be cleverly stated in terms of a so-called risk-neutral probability measure. The 
latter concept can be equivalently stated in terms of a stochastic discount factor 
or a positive linear pricing rule. 

We next use linear programming duality to give a formal derivation of the 
equivalence between the absence of arbitrage and the existence of a risk-neutral 
probability measure for the special case of a simple economy in a single-period 
framework. Assume the economy contains m assets. Let So := [S4 =- sin]! 
denote the vector of prices per share of the m assets at time 0 (beginning of the 
period). Assume there are n possible states Q = {w1,..., Wn} at time 1 (end of 
the period). Let Si(w,;) = [Si(wj) = sP (w;)]" denote the vector of prices 
per share of the m assets at time 1 in state wj. 

An arbitrage opportunity in this economy is an opportunity to make money 
without any cost and without any risk. Mathematically, an arbitrage opportunity 
is a portfolio of the m assets that has non-positive cost, yields non-negative 
payoffs in all future states, and in addition either has strictly negative cost 
or generates a strictly positive payoff in some future state. In other words, an 
arbitrage portfolio is a set of holdings y1,..-.,Ym in the m assets such that 


Shy < 0, Si(w;)"y > 0, J=l,...,n 


and such that at least one of these inequalities is strict. 
A positive linear pricing rule is a set of positive numbers 21,...,2p, such that 


So = X Si(w;) xj, U= Dyess M: 
j=1 


Proposition 4.1 Jn the above single-period economy with m assets and n future 
states there is no arbitrage if and only if there exists a positive linear pricing rule. 


Proof Let S := [Si(#1) + Sı(wn)|. An arbitrage portfolio is precisely a 
solution to the following system of inequalities: 
[S -So] y 20. (4.1) 


Similarly, a positive linear pricing rule is precisely a solution to the following 
system of inequalities: 
Sx = So 


x>0. (4.2) 
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Observe that (4.2) has a solution if and only if the following system of inequalities 
has a solution: 
[S —So] u=0 


u >0. (#3) 


Hence it suffices to show that (4.1) does not have a solution if and only if (4.3) 
has a solution. This readily follows from Theorem 2.5(c). 


The existence of a positive linear pricing rule can be equivalently stated in 
terms of a stochastic discount factor or in terms of a risk-neutral measure. In 
both of these interpretations the set of future states Q = {w1,...,wn} is seen as 
a probability space. Assume 2 is endowed with a probability measure P. Then 
the future payoff of each asset 7 can be seen as a random variable S; : Q —> R.A 
stochastic discount factor is a random variable D : Q — R such that 

n 
So = E(DS1) = X D(w;)Si(w;)P(w;), i= 1,...,m. 


j=l 


For convenience, assume there is a risk-free asset in the above economy; that is, 
an asset i such that S = 1 and Si (wj) = 1 +r for j =1,...,n. A risk-neutral 


probability measure is a probability measure Q in the space Q = {w1,..., Wn} 
such that 
il ee I: a 
So = ES) = Tar 2, Ses) Qs) 


Here E indicates that the expectation is taken with respect to the risk-neutral 
probability measure Q, as opposed to the original probability measure P. 
We can now formally state the fundamental theorem of asset pricing. 


Theorem 4.2 (Fundamental theorem of asset pricing) Consider the above single- 
period economy with n future states and m assets, one of which is risk-free. The 
following conditions are equivalent: 


(i 
(ii 
(iii 
(iv 
Proposition 4.1, which gives the equivalence between (i) and (ii), provides the 
crux of the proof of Theorem 4.2. The proofs of the other equivalences are a 
straightforward exercise. 


There are no arbitrage opportunities. 
There exists a positive linear pricing rule. 
There exists a positive stochastic discount factor. 


) 
) 
) 
) 


There exists a risk-neutral probability measure. 


One-Period Binomial Pricing Model 


This section illustrates the pricing of a contingent claim on an underlying risky 
security in a simple one-period binomial model. This model provides the building 
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block for the powerful and widely used multi-period binomial pricing model that 
we will discuss in Chapter 15. 

Consider a single-period economy with a risk-free asset and a risky asset. Let 
r denote the risk-free rate and So denote the price per share of the risky asset 
at time 0. Assume there are two possible future states Q = {H,T} at time 1. 
Assume the price per share of the risky asset at time 1 is S;(H) = u- So in state 
H and $\(T) = d- So in state T for some “up” and “down” factors u > d > 0: 


S\(H) = uso 


So a 
$1(T) = dso 


In this economy there is no arbitrage if and only if u > 1++r > d and in this 
case the risk-neutral probability measure should satisfy 


So = (QU sil) + Q(T)S2(T)) = E (Q(H)u + (T)d) 
1 = Q(H)+ Q(T). 
Therefore, 
QUA) = = Q(T) = ee (4.4) 


It is customary to write p := Q(H) and q := 1 — p = Q(T) as shorthand for the 
risk-neutral probabilities and p = P(H) and q = 1 — p = P(T) for the actual 
probabilities: 

Consider the problem of pricing a contingent claim on the risky asset with the 


Vi (H) 
V =? — 
Vi (T) 
For example, the contingent claim could be a European call option — that is, 
a contract with the following conditions. At time 1, the holder of the option has 
the right, but not the obligation, to purchase a share of the risky asset, known 
as the underlying security, for a prescribed amount, known as the strike price. 


Thus the payoff of a European call option with strike K is Vj = (S1 — K)* := 
max{ Sı — K,0}. The payoff structure of this option in our one-period binomial 


model is as follows: 
Vi (7) = (uSo z= K)* 
Vo =? a 
V;(T) = (dS — K)* 


A European put option is a similar contract, except that it confers the right to 


following payoff structure: 


sell the underlying security for a prescribed strike price. 
The fundamental theorem of asset pricing implies that the fair price Vo of a 
general contingent claim with payoffs Vi(H) and Vi(T) is 
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Vo (pVi(H) + qVi(T)). 


l+r 
Furthermore, the binomial pricing model yields the following delta-hedging for- 
mula to construct a portfolio of the underlying risky asset and the risk-free asset 
that replicates the payoff of the contingent claim. At time 0 construct a portfolio 
with A shares of the underlying risky asset and B shares of the risk-free asset 
where 


_ Vil) -VAT) — W(t) — Vi(T) B uVı(T) — dVi (H) 


~ Sı(H)- S(T) So(u-d) ’ ~ (1+r)(u-d) 
A straightforward verification shows that this portfolio replicates the payoff of 
the contingent claim. That is, the payoff of the portfolio (A, B) is as follows: 


ASı(H)+ (1 +r) B = V (H) 


Aso + B < 
ASı(T) + (1 +r) B = V (T) 


Thus the value of this replicating portfolio at time 0 must be Vo to rule out 
arbitrage. Indeed, the value of the replicating portfolio at time 0 is 


(1+r)(Vi(4) — Vi(T)) + uV (TL) — dvi (A) 
(1+r)(u—d) 


(pbVi(H) + Vi (T)) = Vo. 


Aso + B= 


a +r 
Example 4.3 Suppose stock XYZ has share price So = 40 today. Suppose the 
share price of stock XYZ a month from today will either double or halve with 
equal probabilities: 


Sı(H) = 80 


So = 40 <a 
Sı (T) = 20 


Assume also that the one-month risk-free rate is zero. Consider a European 
call option to buy one share of XYZ stock for $50 a month from today. What is 
the fair price of this option? 


In Example 4.3 we have u 2,d = 4 and r = 0. Thus the risk-neutral 
probabilities are p = 3 and gq = £. Next, observe that a month from now the call 
option with strike price $50 will be worth $30 = $80 — $50 in the H state and it 


will be worthless in the T state. Thus the fair price of the option is the price of 


the following contract: 
(80 — 50)* = 30 
<< 
(20 — 50)* = 0 


The fundamental theorem of asset pricing implies that the fair price of this 
contract is 


ww || 
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Furthermore, from the delta-hedging formula it follows that a replicating portfo- 
lio can be constructed by buying 5 share of stock XYZ and borrowing 10 shares of 
the risk-free asset. Observe that the value of this replicating portfolio at time 0 is 


1 
= -40 — 10 = 10. 
2 
Using the risk-neutral probability measure we can also price other derivative 
securities on the XYZ stock. For example, consider a European put option on 


the XYZ stock with strike price $60 and with the same expiration date: 
(60 — 80)t = 0 
<< 
(60 — 20)* = 40 
It readily follows that the fair price of this option is 


2 80 
0-6+40-¢=40--=—. 
pt q 3 3 


Observe that in the one-period binomial pricing model the risk-neutral prob- 
ability is unique and the payoff of any contingent claim can be replicated via 
delta-hedging. In general, uniqueness of the risk-neutral probability corresponds 
to completeness of the market. The latter concept means that the payoff of any 
contract can be replicated with a portfolio of the existing underlying assets in 
the economy as detailed in Exercise 4.6. 


Static Arbitrage Bounds 


The no-arbitrage approach discussed in Section 4.2 has the drawback that it 
assumes only a finite number of possible future states. In this section, we do not 
make this assumption. Instead, we assume that there is a finite set of derivative 
securities written on the same underlying asset and with the same maturity. 
We show how the no-arbitrage approach can be used to obtain so-called static 
arbitrage bounds on the price of a new derivative security implied by the prices 
of the other derivative securities. As in Section 4.2, the gist of this approach is 
to use linear programming to detect arbitrage opportunities in a single-period 
economy. This discussion is based on Herzel (2005). 

Consider an underlying security with a (random) price Sr at a future time T. 
Consider n derivative securities written on this security that mature at time T, 
and have piecewise linear payoff functions V;(S), each with a single breakpoint 
K;, for i = 1,...,n. The obvious motivation is the collection of calls and puts 
written on the underlying security with strike prices K;, i = 1,...,n. More 
precisely, if the ith derivative security were a European call with strike price K;, 
we would have W; (Sr) = (Sp — K;)*. If it were a European put with strike price 
K;, we would have WU; (Sr) = (K; — Sr)”. 
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We shall assume without loss of generality that the K;s are in increasing order. 
Also, we let p; denote the current price of the ith derivative security. Consider a 
portfolio x = [ay ++ an] of the derivative securities 1 to n and let Y*(Sr) 
denote the payoff function of the portfolio: 


U* (Sr) = DD Wi (Sr)a;. 
i=1 
The cost of the portfolio x is given by 


X piz: (4.5) 
i=1 


To determine whether there exists an arbitrage opportunity in the above set 
of n derivative securities, we consider the following question: Is it possible to 
construct a portfolio of the derivative securities 1,...,n with negative cost and 
whose payoff function Y*¥(Sr) at time T is non-negative for all Sr € [0,00)? 
Since non-negativity of Y*(Sr) corresponds to “no future obligations” such a 
portfolio would be an arbitrage opportunity. 

Since all Y;(Sr)s are piecewise linear, so is Y*(Sr) with breakpoints in 
Kı,...,Kn. Note that a piecewise linear function is non-negative over [0, co) 
if and only if it is non-negative at 0 and all the breakpoints, and if the slope 
of the function is non-negative to the right of the largest breakpoint. In other 
words, U*(S7) is non-negative for all Sr > 0 if and only if the following three 
conditions hold: 


These three conditions can be written as the following system of linear 
inequalities: 


M 
< 
G 
= 
IV 
oO 


i=1 
So Wii) eh > 0, San (4.6) 
gl 

S RF DS TE e > 0. 


i=l 


Since all Y;(Sr)s are piecewise linear, the quantity U;(K, + 1) — U;(K,,) gives 
the right derivative of Y;(Sr) at Kn and the expression in the last constraint is 
the right derivative of U*(S;) at Kn. The system of linear inequalities (4.6) can 
be more succinctly written as 


Kx >0 
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for 
Y (0) oh Yn (0) 
Y (Kı) zE Pn Kı) 
K = : : 
V1 (K,) zE Ww, (Kn) 
Wi(K, +1)-—Vi(K,) © U,(h, +1) — (Kn) 


It thus follows that the above type of arbitrage opportunity exists if and only if 
the following problem has a solution: 


Kx >0, p'x <0. 


Next, we focus on the special case where the derivative securities under con- 
sideration are European call options with strikes K; for i = 1,...,n. In this case 
Wi (Sr) = (Sr — K;)* and hence 


Ui(K;) = (Kj — Ki)*. 


In this case, (4.6) can be written as 


Ax>0 (4.7) 
for 
la 0 0 vee o| 
K3—K, K3- Ko 0 -0 
A= ; : : ni 
Kn- Kı K,-—Ky K,-—K3 > 0 
1 1 1 ree] 


This formulation is obtained by removing the first two constraints of (4.6) which 
are redundant in this particular case. Using this formulation, we obtain the 
following theorem giving necessary and sufficient conditions for a set of call 
option prices to contain no arbitrage opportunities. 


Theorem 4.4 Let Kı < Kə < --- < Ky denote the strike prices of European 
call options written on the same underlying security with the same maturity. For 
i=1,...,n let pi denote the price of the ith call option. There are no arbitrage 
opportunities if and only if the prices pj, i = 1,...,n, satisfy the following 
conditions: 


(i) 0 < pes Pn-1 ee pa. 
(ii) The piecewise linear function C : [K1, Kn] > R with breakpoints K1,..., Kn 
defined by C(K;) := pi, i = 1,...,n, is convex. 


The previous approach can be further extended to infer both lower and upper 
bounds on the current price Pnew Of a new derivative with maturity T and payoff 
Wnew(Sr) given prices of other derivatives on the same underlying security and 
with the same maturity. As before, assume Ynew( Sr) and Ẹ;(Sr) are piecewise 
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linear functions each with a single breakpoint K and K;, i = 1,...,n, respec- 
tively. Assume Kı < Kə <--- < Kn and let p; denote the current price of the 
ith derivative security. 

Assume there is no arbitrage involving the n derivatives with payoffs Y;(Sr), 
for i = 1,...,n. The previous reasoning applied to the larger set of n + 1 
derivatives shows that there is no arbitrage if and only if the following two 
conditions hold: 


< 
e First, Phew > p'x for any portfolio x = [x1 vee Ln] such that 


Vnew(Sr) > W* (Sr) for all ST > 0. 


= 
e Second, Phew < p'x for any portfolio x = [x1 ee Tal such that 


Vnew(Sr) < W* (Sr) for all ST > 0. 


In words, the first condition states that the price of the new derivative has to be 
at least as large as the price of any sub-replicating portfolio of the old securities. 
Likewise, the second condition states that the price of the new derivative has 
to be at most as large as the price of any super-replicating portfolio of the 
old securities. The above two conditions automatically yield the following static 
arbitrage bounds on p. 


Lower bound: 


Deo. := Max p'x 
s.t. Wnew(Sr) = UW (Sr) for all Sr = 0. 


Upper bound: 


u pees A T 
Pnew := mn p x 


s.t. Wnhew(Sr) < U*(Sr) for all Sr > 0. 


The piecewise linearity of Vyew(Sr) and Y;(Sr), i = 1,...,n, implies that both 
inequalities Ypewl Sr) > Y*(Sr) for all Sr > 0, and Ynewl Sr) > Y*(Sr) for 
all Sr > 0 can be formulated as a finite system of linear inequalities. Therefore, 
both the upper and lower static arbitrage bounds can be formulated as linear pro- 
gramming models. In particular, for the special case where Y; (Sr) = (Sr—K;)t, 
for i = 1,...,n, and Ynewl ST) = (Sr — K)* with Kı < K < K, the static arbi- 
trage upper bound pt... on p can be written as the following linear programming 
model (Exercise 4.10): 


u = 4 T 
Phew := mn p x 


s.t. Ax> b, (8) 
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where 
Kə Kı 0 0 0 (K2 a K) 
K= Ki K= K3 0 0 (K; — K)+ 
A = 3 b == 
Kn- Kı Kn-Kə Kn- K 0 (Kn — K)* 
1 1 1 1 0 


Tax Clientele Effects in Bond Portfolio Management 


This section presents a model proposed by Ronn (1987) to elicit clientele effects 
induced by taxes in the bond market. Related models were also proposed by 
Hodges and Schaefer (1977) and Schaefer (1982). The crux of the model is to 
formulate a linear program that exploits the price differential of bonds given 
their after-tax cash flows. To do so, the model finds a long-short portfolio that 
simultaneously buys “underpriced” bonds and sells “overpriced” bonds while 
ensuring non-negative cash flows throughout the lives of the bonds. 

Next we describe the details of the model. Assume the bond market includes 
N bonds with the following characteristics: 


e The ask and bid prices of bond j are pj and Pp, respectively for j = 1,..., N. 


e Each unit of bond j generates a cash flow at at date t for j = 1,...,N 


j 


and t = 1,...,T. These cash flows are after-tax coupon and/or principal 
payments. 
e The minimal risk-free reinvestment rate at future dates t = 1,...,T is p. 


Linear programming model for tax clientele effects in the bond market 


Variables: 
x4: number of units of bond j bought, for j = 1,..., N 


a 
j 
a: number of units of bond j sold, for 7 = 1,..., N 
ze: Surplus cash flow at date t, fort =1,...,T7. 


Objective: 


N 
max So -5p r3. 
j=l 
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Constraints: Cash balance constraints in each date and bounds on 2%, 22 


JIER 
and zz: 
N N 
a = Daley- Sat 
j=1 j=1 
N N 
= (1+p)z-1+ S ax- Y air, for t=2,...,T 
j=l j=l 
ae > 0, for 7=1,...,N 
z > 0, for t=1,...,7 
z3, g? < 1, for j=1,...,N 
(4.9) 


Some comments are in order. The above objective function is the net difference 
between the value of the short positions and long positions of the portfolio. The 
short positions have to settle at the bid prices whereas the long positions have 
to settle at the ask prices. Because of this distinction, the constraints 2x4, a? >0 
are required. To ensure that the portfolio is risk-free, we require the surplus cash 
flows z; to be non-negative for each date t. 

The resulting linear program admits two main types of solutions. Either all 
bonds are priced within the bid—ask spread. In that case the optimal value of 
the linear program is zero and it is trivially attained by not taking any short or 
long positions. On the other hand, if there are exploitable price differentials in 
the bonds, the linear program chooses long and short holdings so as to maximize 
the difference between the values of the long and short positions. In that case 
the optimal value is positive. To avoid unbounded values, the model includes the 
upper bounds 2%, a! < 1 on the long and short holdings. 

Note that the model requires bonds with perfectly forecastable cash flows. 
Thus, non-callable bonds and notes are deemed appropriate, but callable bonds 
are excluded. 

The proposed model explicitly accounts for the taxation of income and capital 
gains for specific investor classes. This means that the cash flows need to be 
adjusted for the presence of taxes. For a discount bond (that is, when pẹ < 100), 
the after-tax cash flow of bond j at date t is 


a’ = cj (1 —T), 


where cj is the coupon payment at date t and 7 is the ordinary income tax rate. 
At maturity, the after-tax cash flow of bond j is 


ai, = (100 — p$)(1 — g) + p$, 
where g is the capital gains tax rate. 
On the other hand, for a premium bond (that is, when p4 > 100), the premium 


is amortized against ordinary income over the life of the bond, giving rise to an 
after-tax coupon payment of 


a2, 


nj 


2 — 100 
Ja Pe ne aay 


mG 
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where nj is the number of coupon payments remaining to maturity. 
A premium bond also makes a non-taxable repayment of 


a’ = 100 


at maturity. 

Major categories of taxable investors are domestic banks, insurance companies, 
individuals, non-financial corporations, and foreigners. In each case, one needs 
to distinguish the tax rates on capital gains versus ordinary income. 

As an example, consider tax-exempt investors. For this class of investors, 
Schaefer (1982) observed that the “purchased” portfolio contains high coupon 
bonds and the “sold” portfolio is dominated by low coupon bonds. This can be 
explained as follows: The preferential taxation of capital gains for (most) taxable 
investors causes them to gravitate towards low coupon bonds. Consequently, for 
tax-exempt investors, low coupon bonds are “overpriced” and not desirable as 
investment vehicles. 


Notes 


The fundamental theorem of asset pricing is central to the mathematical finance 
literature. The connection between arbitrage and risk-neutral pricing underlies 
the classical work of Merton (1973) and Black and Scholes (1973). More explicit 
and formal statements on the relation between absence of arbitrage and existence 
of stochastic discount factors in single-period as well as in multi-period settings 
were developed by Ross (1976), Harrison and Kreps (1979), and Harrison and 
Pliska (1981). The textbooks by Back (2010), Duffie (2001), and Shreve (2000) 
give a detailed treatment of this important topic. 


Exercises 


Exercise 4.1 The Excel spreadsheet “Exercise 4.1 FX model” gives cross- 
currency exchange rates among the currencies USD, EUR, GBP, AUD, and JPY. 
Use a linear programming model to detect if these exchange rates contain an 
arbitrage opportunity. To do so, use the following decision variables: 


Zij: amount of currency i converted to currency j. 
Yr: net amount of currency k after all transactions. 


Is there an arbitrage opportunity? If the answer is yes, then describe it, for 
example: “Convert 1000 USD to EUR then to JPY then back to USD to net 
1 USD without putting money in.” 


Exercise 4.2 Let Sọ be the current share price of a “risky” security and 
assume that there are two possible share prices for this security at a future 
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time T: Sr(u) = So-u and Sr(d) = So-d, where u > d > 0. Assume there is 
also a “risk-free” security with current share price 1 and future share price 1+r 
at time T. Show that there is no arbitrage opportunity involving the risky and 
risk-free securities if and only ifu>1+r>d. 


Exercise 4.3 Assume that the XYZ stock is currently priced at $40. At the 
end of the next period, the price of XYZ is expected to be in one of the following 
two states: 40 - u or 40 - d. We know that d < 1 < 3 < u but we do not know 
d or u. The interest rate is zero. If a European call option with strike price $50 
is priced at $10 while a European call option with strike price $40 is priced at 
$13, and we assume that these prices do not contain any arbitrage opportunities, 
what is the fair price of a European put option with a strike price of $40? 


Exercise 4.4 Assume that the XYZ stock is currently priced at $40. At the 
end of the next period, the price of XYZ is expected to be in one of the following 
two states: 40-u or 40-d. We know that d < 1 < u but we do not know d or u. 
The interest rate is r = 0. European call options on XYZ with strike prices of 
$30, $40, $50, and $60 are priced at $10, $7, $10/3, and $0. Which one of these 
options is mispriced? Why? 


Exercise 4.5 Prove the equivalences (ii) = (iii) = (iv) in Theorem 4.2. 


Exercise 4.6 Consider the setting of Proposition 4.1. Assume there is no 
arbitrage and thus a positive linear pricing rule exists. 


(a) Show that the linear pricing rule is unique if and only if the matrix S has 
full column rank. 

(b) Consider a new asset with payoff $1"*'(w;) per share in state wj at time 1. 
Show that if the linear pricing rule is unique then there exists a portfolio 
y= [y1 EE Ym] i of the m old assets that replicates the payoff of the new 
asset; that is, 


X Si (ws) ys = ST (w), 5 ee ere (oe 
i=1 


(c) Conclude that to rule out arbitrage, the price S+! at time 0 of the new 
asset must be equal to 


m 
Sty = X Shu. 
i=1 


Exercise 4.7 Prove Theorem 4.4. 


Exercise 4.8 Both Theorem 4.4 and the linear programming model (4.8) 
implicitly assume that the ith call can be bought or sold at the same price pj. 
In real markets, there is always a gap between the price a buyer pays for a 
security and the amount the seller collects called the bid—ask spread. 
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Assume that the ask price of the ith call is p? and its bid price is p? with 
p? > pt. Develop analogs of Theorem 4.4 and of (4.8) in the case where we can 
only purchase the calls at their ask prices or sell them at their bid prices. 


Exercise 4.9 Consider all the call options on the S&P 500 index or on a highly 
traded security that expire on the same day, about three months from today. 
Their current prices can be downloaded from the website of the Chicago Board 
of Options Exchange at www.cboe.com or several other market quote websites. 
Formulate the linear programming problem (4.7) (or, rather, the version you 
developed for Exercise 4.8 since market quotes will include bid and ask prices) 
to determine whether these prices contain any arbitrage opportunities. 

Sometimes, illiquid securities (those that are not traded very often) can have 
misleading prices since the reported price corresponds to the last transaction 
in that security, which may have happened several days ago, and if there were 
to be a new transaction, this value would change dramatically. As a result, it 
is quite possible that you will discover false “arbitrage opportunities” because 
of these misleading prices. Repeat this exercise but this time use only prices 
of call options that have had a trading volume of at least 100 on the day you 
downloaded the prices. 


Exercise 4.10 Prove that, for the special case where U;(Sr) = (Sr — Ki)*, 
with i = 1,...,n, and Wpew(Sr) = (Sr — K)* with Kı < K < Ky, the 
static arbitrage upper bound Phew on p can be written as the following linear 
programming model: 


u sa T 
Pew = Min p x 


s.t. Ax >b, 
where 
Kə- Kı 0 0 -0 (Kə — K)* 
K,- K; K3- Ko 0 -0 (K; — K)t 
A= » ijo b= ; 
Kn- Kı Kn- K2 K,—-—K3 > 0 (Kn — K)t 
1 1 1 ©- 1 0 


Exercise 4.11 The purpose of this exercise is to see whether the results 
observed by Schaefer (1982) (see Section 4.5) occur in the current bond market. 
Only use non-callable bonds and notes. 

Consider first the class of tax-exempt investors. Using current data, form the 
optimal “purchased” and “sold” bond portfolios using the linear program pre- 
sented in Section 4.5. Do you observe the same tax clientele effect as documented 
by Schaefer for British government securities; namely, the “purchased” portfolio 
contains high coupon bonds and the “sold” portfolio is dominated by low coupon 
bonds. 
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Repeat the same analysis with different types of taxable investors. 


(a) Is there a clientele effect in the pricing of US government investments, 
with tax-exempt investors, or those without preferential treatment of capital 
gains, gravitating towards high coupon bonds? 

(b) Do you observe that not all high coupon bonds are desirable to investors 
without preferential treatment of capital gains? Nor are all low coupon bonds 
attractive to those with preferential treatment of capital gains. Can you find 
reasons why this may be the case? 


The dual price, say uz, associated with the cash balance constraint at date t in 
(4.9) represents the present value of an additional dollar at time t. Explain why. 
It follows that us may be used to compute the term structure of spot interest 
rates R,, given by the relation 


1 1/t 
Ut 


Compute this week’s term structure of spot interest rates for tax-exempt 
investors. 


Part II 
Single-Period Models 


5.1 


Quadratic Programming: Theory 
and Algorithms 


Quadratic Programming 


A quadratic program is an optimization problem whose objective is to minimize 
or maximize a quadratic function subject to a finite set of linear equality and 
inequality constraints. By flipping signs if necessary, a quadratic program can be 
written in the generic form: 

min 4x'Qx +e!x 

x 
s.t. Ax=b (5.1) 
Dx > d 


for some vectors and matrices c € R”, b € R”, d € R?, A € R™*", D € RP*”, 
Q € R"*”. As observed in Chapter 1 we may assume that Q is a symmetric 
matrix. The term quadratic programming modelis also used to refer to a quadratic 
program. We will use these terms interchangeably throughout the book. 

Quadratic programming models arise in a variety of practical contexts. The 
seminal mean-variance model of Markowitz and most of its variants for portfolio 
selection are quadratic programs as we illustrate in Example 5.1 below and 
discuss in full detail in Chapter 6. The popular ordinary least-squares and lasso 
estimation procedures in linear regression are also quadratic programs. Quadratic 
programs are also often solved as subproblems in the solution of more general 
nonlinear optimization problems. 

Observe that the constraint set in (5.1) is convex since it is a system of 
linear inequalities. Furthermore, the objective function of (5.1) is convex when 
Q is a positive semidefinite matrix. Throughout this chapter we assume that 
Q is symmetric and positive semidefinite. Therefore problem (5.1) is a convex 
program. 

A quadratic programming model is in standard form if it is written as follows: 

min ix’ Qx +c'x 
st. Ax=b (5.2) 
x > 0. 


Example 5.1 (Asset allocation) Assume the one-year returns of the asset classes 
large stocks, small stocks, and bonds have the following correlations and standard 
deviations: 
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Large Small Bonds Standard deviation 


Large 1 0.6 0.2 0.12 
Small 0.6 1 0.5 0.20 
Bonds 0.2 0.5 1 0.05 


Determine the asset allocation of minimum risk, that is, find a portfolio com- 
prised of these three asset classes whose return has the lowest standard deviation. 
Assume the portfolio can only hold long positions in each of the asset classes. 


This problem can be formulated as a quadratic programming model. To that 
end, first construct the covariance matriz V of asset returns: this is the matrix 
whose (i,j) entry is the covariance of asset i and asset j; that is, pij +o; Oj. 
Using matrix notation and ‘o’ to denote the componentwise product of matrices, 
the covariance matrix can be computed as 


1 06 0.2] [0.12 
0.6 0.5] o |0.20| [0.12 0.20 0.05] 
0.2 0.5 1 0.05 


1 0.6 0.2 0.0144 0.024 0.006 
= |0.6 0.5| o | 0.024 0.04 0.01 
0.2 0.5 1 0.006 0.01 0.0025 


0.0144 0.0144 0.0012 
0.0144 0.04 0.005 
0.0012 0.005 0.0025 


vV 


= 


=. 


We are now ready to describe the quadratic programming formulation for 
the above asset allocation problem. (A more detailed discussion is given in 
Chapter 6.) 


Quadratic programming model for asset allocation 
Variables: 


zi: percentage of the portfolio invested in asset 7 for i = 1,2,3. 
Objective (minimize the variance of the portfolio return): 


minx'Vx = min (0.014427 + 0.04x2 + 0.002523 + 0.028811 T2 
x U1 ,X2,X3 


+ 0.002427; £3 + 0.01223) 
Constraints: 
zı +22+23 = 1 (percentages add up to one) 
1,22,23 > 0 (long-only positions). 


Observe that even in this small example the quadratic objective is much 
more concise and easier to write using matrix notation. 


We now discuss the special case of a convex quadratic program without con- 
straints. As Example 5.3 below illustrates, this kind of model arises naturally in 
the ordinary least-squares procedure. 
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Consider a quadratic program without constraints: 
. T T 
min $x'Qx+c'x. (5.3) 
The optimality conditions in this case are as follows. 


Theorem 5.2 Letc € R”, QE R”*” and assume Q is symmetric and positive 
semidefinite. If (5.3) is bounded, then it attains its minimum. Furthermore, a 
point x E R” is an optimal solution to (5.3) if and only if 


Qx+c=0. (5.4) 


When Q is positive definite, the problem (5.3) has the unique minimizer x = 
—Q~'!c. When Q is positive semidefinite but not positive definite, the matrix Q 
is singular and the problem (5.3) is either unbounded or has multiple solutions. 


Example 5.3 (Ordinary least squares) Assume (X;, yi), for i = 1,..., N, is 
a random sample drawn from the joint distribution of X,Y where X,Y are 
respectively R?-valued and R-valued random variables. Using the training data 
(xi, yi), with i = 1,..., N, estimate a vector of coefficients 3 for the linear model 


Y =8B'X +e. 


The most popular approach to this problem is to find the estimate of B that 
solves the following least-squares problem: 


N 
min $ (3°: - yi)’. 
i=1 


Observe that 


N 
S (B'xi — yi)? = (XB - y)" (XB - y) = B'X™XB— 2y'Xp + y'y 
i=l 
for 
x] yı 
X = ; ; y = ; 
XN YN 


Hence the least-squares problem can be formulated as follows. 


Quadratic programming formulation for least-squares estimation. 
Variables: 
6B: vector of coefficients in the linear model Y = Bix +e. 


Objective: 
min 58'QB—b'Z, 


where Q := XTX, b= X'y. 
Constraints: None. 
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By applying Theorem 5.2 we obtain the widely known solution to the 
least-squares problem: 


Ê := Qb = (XTX) 'X'y, 


provided the N x p matrix X has full column rank. This latter condition usually 
holds in the typical practical situation when there are more observations than 
predictor variables; that is, when N > p. However, the case N < p occurs 
as well. In this kind of situation the matrix X is never full column rank so 
the ordinary least-squares approach is not appropriate. Section 5.6.2 describes 
two popular variants for this kind of situation, namely ridge regression and 
lasso regression, both of which can be seen as modifications of the ordinary 
least-squares procedure. 


Numerical Quadratic Programming Solvers 


As with linear programming, there are a variety of highly efficient, fast, and 
reliable commercial and open-source software packages for convex quadratic 
programming. Most of these packages implement versions of the algorithms 
sketched in Section 5.5 below. We illustrate two of these solvers by applying 
them to Example 5.1. 


Excel Solver 


Figure 5.1 displays a printout of an Excel spreadsheet implementation of the 
quadratic programming model for Example 5.1 as well as the dialog box obtained 
when we run the Excel add-in Solver. The spreadsheet model contains the three 
components of the quadratic program. The decision variables are in the range 
B20:D20. The objective function is in cell E22. The Excel formula in this cell, 
using matrix operations, is as follows: 


MMULT(B20 : D20, MMULT(B16 : D18, TRANSPOSE(B20 : D20))). 


The left-hand and right-hand sides of the equality constraint are in cells E20 and 
G20 respectively. 


MATLAB CVX 


Figure 5.2 displays a CVX script for the same problem. The script can be run 
provided the freely available CVX toolbox is installed. 
Using either of these solvers we obtain the optimal solution to the problem in 
Example 5.1: 
0.0897 
x= 0 


0.9103 
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Solving Method 

Select the GRG Nonlinear engine for Solver Problems that are smooth 
nonlinear. Select the LP Simplex engine for linear Solver Problems, 
and ma the Evolutionary engine for Solver problems that are non- 
smooth. 


Close Solve f 


"E A | R R PST R a E E [as O ee “A iP I T J | 
Correlations and i 
standard deviations | f - = = 
f | 
r] Set Objective: 5E522 fi 
i 
1 2 < 
+ To: ( Max (@) Min Ç) Value Of: 
! By Changing Variable Cells: 
Products of ! $B$20:5D$20 =| 
standard deviations h 
00144 0.024 0.006 ' Subject to the Constraints: 
0.024 0.04 0.01 ! $E$20 = $G$20 
0.006 0.01 0.0025 ! 
l Change 
Covariance matrix i 
=85*811 =C5*C11 =D5*D11 i Delete 
=B6*B12 =C6*C12 =D6*D12 h 
=87*813 =C7*C13 =D7*D13 | 
| total ! Reset All 
T i Load/Save 
i 
i 
Portfolio variance [=¥MMMULT(620:020, MMU À; m Make Unconstrained Variables Non-Negative 
i 
! Selecta Solving Method: | GRG Nonlinear [*] | Options 
i 
i 
i 
' 
i 
T 
i 
i 
i 
i 
i 
i 
ii 
i 
i 
i 
i 
i 
' 
i 
T 


Figure 5.1 Spreadsheet implementation and the Solver dialog box for the asset 
allocation model 


' e O. (8) Editor - /Us¢ 
File Edit Text Go Cell Tools Debug Desktop Window 


x a | ~a JOGE rave oD- a 
a -fio]+ +m]: Be 0 


% Matlab CVX code for asset allocation model 
n=3; 

sigma = [0.12;0.2;0.05] ; 

correl = [1 0.6 0.2; 0.6 1 0.5; 0.2 0.5 1] ; 
V = correl.*(sigma*sigma') ; 


cvx_begin 
variables x(n) ; 
minimize (x'*V*x) 
ones(1,3)*x == 1 ; 
x >= zeros(n,1l) ; 
cvx_end 


m 
o 
iw fa T) 


Figure 5.2 MATLAB CVX code for the asset allocation model 


Sensitivity Analysis 


As is the case for linear programming, the process of solving a quadratic program 
also generates some interesting sensitivity information via the so-called Lagrange 
multipliers associated with the constraints. Assume the constraints of a quadratic 
program, and hence the Lagrange multipliers, are indexed by i = 1,...,m. 
The Lagrange multiplier y= of the ith constraint has the following sensitivity 
interpretation: 


If the right-hand side of the ith constraint changes by A, then the optimal 


value of the quadratic program changes by approximately A - y? for 
small A. 
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Unlike the shadow prices of a linear program, the Lagrange multipliers only 
give an approximation of the change in the optimal objective value. The situation 
is akin to how the derivative of a quadratic (or more general nonlinear) function 
at a particular point gives an approximation of the change in the function value 
when that point changes. 

Both Excel Solver and MATLAB CVX compute the Lagrange multipliers 
implicitly. To make this information explicit in Excel Solver, we request a 
sensitivity report after running Solver as shown in Figure 5.3. 


J A B Cc D E F[G 
_1 |Correlations and 
_2 |standard deviations 
3 
4 large small bonds std dev 
5 | large 1 06 0.2 0.12 
6| small 0.6 1 05 02 
7 | bonds 0.2 05 1 0.05 
8 
‘3 | Products of 
_10 (standard deviations 
11 | 0.0144 0.024 0.006 
12 0.024 0.04 0.01 
33 | 0.006 0.01 0.0025 
T] 
15 (Covariance matrix 
16 | 0.0144 0.0144 0.0012 
ZJ] 0.0144 0.04 0.005 
18 0.0012 0.005 0.0025 
E 
20 | Portfolio pohiti = 1 
21| 
22 portfolio variance 
24| 0 00 Solver Results 


Solver found a solution. All constraints and optimality 
conditions are satisfied. 


© Keep Solver Solution 
(Restore Original Values 


[C Return to Solver Parameters Dialog ~] Outline Reports 


l Save Scenario... Cancel OK 


Figure 5.3 Requesting sensitivity report in Solver 


Figure 5.4 displays the sensitivity report for Example 5.1. The values yř can 
be found in the column labeled “Lagrange Multiplier”. In CVX this information 
can also be obtained by including a line of code to save the dual information y 
as shown in Figure 5.5. Both solvers yield the dual value y* = 0.0047669. 


*Duality and Optimality Conditions 


As in linear programming, there is a dual quadratic program associated with 
every primal quadratic programming problem, and this dual can be obtained via 
the Lagrangian function. Throughout this section consider the primal quadratic 
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Variable Cells 


Final Reduced 
Cell Name Value Gradient 
$B$20 Portfolio large 0.089655197 0 
$CS20 Portfolio small O 0.006918652 
$D$20 Portfolio bonds  0.910345803 0 
Constraints 
Final Lagrange 
Cell Name Value Multiplier 
SES20 Portfolio total 1.000001  0.004766915 
Figure 5.4 Sensitivity report 
ji e Ai (8) Editor - /Users/ 


File Edit Text Go Cell Tools Debug Desktop Window 
wot | 40 SGea@seeOQrvre 8&8 B- A 
Ja -fio]+ +m]: Bw 0 


1 $ Matlab CVX code for asset allocation model 
2 $ Request dual variables 

3- n=3; 

4- sigma = [(0.12;0.2;0.05) ; 

5- correl = [1 0.6 0.2; 0.6 1 0.5; 0.2 0.5 1] ; 
6 - V = correl.*(sigma*sigma') ; 

7 

8- cvx_begin 

9- variables x(n) ; 

10 - dual variable y ; 

11 - minimize (x'*V*x) 
12 - y: ones(1,3)*x == 1; 
13- x >= zeros(n,1) ; 
14 - cvx_end| 


Figure 5.5 MATLAB CVX code with dual variables 


program 
> 1 
min ix’ Qx +e!x 
st. Ax=b (5.5) 
Dx >d, 


where c € R”, Q € R”*”, A € R”*”, b € R”, D € R?*", d € R, and Q is 
symmetric and positive semidefinite. 
The Lagrangian function associated with (5.5) is 


L(x,y,s) := 4x™Qx + c"x + y' (b — Ax) + s" (d — Dx). 


The constraints of (5.5) can be encoded via the Lagrangian function through the 
following observation: For a given vector x 


max L(x, y,s) = 
s>0 


ixTQx +e'™x if Ax=b and Dx> d 
+oo otherwise. 
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Therefore the primal problem (5.5) can be written as 


min max L(x, y,s). 
ys 
hi s>0 


The dual problem is obtained by flipping the order of the min and max opera- 
tions: 


max min L(x, y,s). 
y, 
s>0 X 


It is easy to see that the dual problem can be written as follows: 


max bly+d's— ix’ Qx 
x,y,S 


st. Aly+D's—Qx=c (5.6) 
s>0. 


In particular, when the primal problem is in standard form (5.2), the dual 
problem is 
ee ae as 
st. Aly—Qx+s=c 
s>0. 


Observe that the dual problem of a quadratic program is again a quadratic 
program. Note that, unlike the case of linear programming, some primal-like 
variables x also appear in the dual problem. As in linear programming, there is 
a deep connection between the primal problem (5.5) and its dual (5.6). The next 
result follows by construction. 


Theorem 5.4 (Weak duality) Assume x is a feasible point for (5.5) and (xX, y,s) 
is a feasible point for (5.6). Then 


bly +d's— $x'Qx& < }x'Qx+c'x. 
Proof If x and (x,y,s) satisfy the above assumptions then 


bly + d's — $x'Qx < (Ax)'y + (Dx)'s — 4x™Qx 
= (Aly +D's)'x— $x'Qx 


II 


IA 


The following much deeper result also holds. 
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Theorem 5.5 (Strong duality) Assume one of the problems (5.5) or (5.6) is 
feasible. Then this problem is bounded if and only if the other one is feasible. In 
that case both problems have optimal solutions and their optimal values are the 
same. 


We refer the reader to Giiler (2010) or Nocedal and Wright (2006) for a proof 
of Theorem 5.5. This result is closely tied to certain kinds of separation theorems 
for convex sets. For details see Güler (2010, chapters 6 and 11). A powerful 
consequence of Theorem 5.5 is the following characterization of the solutions to 
both (5.5) and (5.6). 


Theorem 5.6 (Optimality conditions) The vectors x € R” and (X,y,s) € R” x 
R™ x R? are optimal solutions to (5.5) and (5.6) respectively if and only if 


Qx = QX and 


cooo°o 


Qx+c—Aly—D's 
Ax—b 

Dx-d 

s 

(Dx a d);s; 


(5.7) 


I IV IV Ih 


i= 1,..., p. 


yi 


For a quadratic program in standard form (5.2), the optimality conditions (5.7) 
can be written as follows: 


-Qx +A'y +s = c 
Ax = b 
x > 0 (5.8) 
s > 0 
TiSi = 0, i=1,. Nn. 


Observe that (5.8) nicely extends the optimality conditions (2.9) for linear pro- 
gramming in standard form. 

The optimality conditions (5.7) can be seen as “saddle-point” conditions for 
the Lagrangian function 


L(x,y,s) = 4x'Qx+c'x+y'(b— Ax) + s" (d — Dx). 


We next discuss the special case of a quadratic program with equality con- 
straints only. Consider the problem 


5 i T T 
min 5x Qx+c x 


x 5.9 
s.t. Ax=b, 2 


where c € R”, Q € R”*””, A € R”*”, b € R”, and Q is symmetric and positive 
semidefinite. In this case the optimality conditions (5.7) simplify to 


Qx+c—Aly = 0 


Ax—b = 0. ea) 
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The optimality conditions (5.10) in turn can be stated in terms of the Lagrangian 
function of (5.9): 


L(x, y) = $x'Qx +c'x+y!(b— Ax). 
Indeed observe that (5.10) can be succinctly written as 
VL(x,y) = 0. 


When Q is positive definite and A has full row rank, problem (5.9) has a unique 
minimizer x and a unique Lagrange multiplier y given by 


x] [Q -A7] ` [-c 
y| JA 0 b]- 
In particular, if Q is positive definite and A has full row rank, then the minimizer 
and vector of Lagrange multipliers for the problem 
min 3x! Qx 


5.11 
st. Ax=b ( ) 


are respectively 
x* = QAT (AQT!AT) tb 
y* = (AQT'AT) b. 


Example 5.7 (Asset allocation) Consider the same problem as in Example 5.1 
but assume this time that the portfolio is allowed to hold short positions. 


The formulation for this modification of Example 5.1 is straightforward: just 
drop the non-negativity constraint on the variables. Thus we obtain the quadratic 
programming model 


min ix! Vx 
x 
st. 1'x=1. 
From the above discussion it readily follows that the optimal solution and 
Lagrange multiplier are 


1 


* voli 
* = ivi 

OR 
Y = IVL 


For the particular value of V in Example 5.1 we get the following optimal solution 
and Lagrange multiplier 


0.1934 
x* = |—0.1406] , y* = 0.001897074. 
0.9472 
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* Algorithms 


We next sketch extensions of the two main algorithmic schemes for linear pro- 
gramming discussed in Chapter 2. The first scheme, namely active-set methods, 
can be seen as an analog of the simplex method. The second scheme, namely 
interior-point methods, is a straightforward extension from the linear program- 
ming to the quadratic programming context. 


Active-Set Methods 


Active-set methods are based on the following key observation. Assume X is an 
optimal solution to (5.5) and 


I:={i=1,...,p: (Dx —d); = 0}. 


Then the optimality conditions (5.7) can be rewritten as 


Qx+c—Alty—D]s; = 0 
Ax—b = 0 

12 

D;x — dy; = 0 (5 ) 
SI > 0. 


If we ignore the last constraint sz > 0, the remaining conditions in (5.12) are 
precisely the optimality conditions of the problem 


+ 1ST T 
min 3X Qx+c'x 


s.t. Ax=b (5.13) 
D;x = dz. 


This suggests an algorithmic strategy to solve (5.5): guess the active set I 
and solve the subproblem (5.13). If the solution X to this subproblem satisfies 
the other conditions in (5.7) then stop. Otherwise, make a new guess for T. 
Algorithm 5.1 gives a possible version of this strategy. 

Each main iteration of Algorithm 5.1 requires solving the following subproblem 
for some current trial solution X and trial active set I: 

min 4 (Ax)'QAx + (Q3 + c)'Ax 
s.t. AAx=0 (5.14) 
D,;Ax = 0. 


To update the trial solution we also need to compute the step length 


:= min? 1, min ~ — 5. 5.15 
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Algorithm 5.1 Active-set method 
1: choose xo feasible for (5.5) and Ip C {i : Djxp = di, i = 1,...,p} 
2: for k =0,1,... do 


3: solve (5.14) for J = I, and X = x, 

4 if Ax = 0 then 

5 compute the Lagrange multipliers Sz of (5.14) for I = I, and X = x, 
6 if s; > 0 then HALT xX is an optimal solution to (5.5) 

7: else 

8 let j := arg min;ez 5i, Ik+1 := Ik\{j}, and Xk+1 := Xk 

9 end if 

10: else 

11: compute a via (5.15) for I = I, and let Xk41 := Xk + QAx 
12: if a, has a blocking constraint j then I,41 := I, U {j} 

13: else Tat := Ip 

14: end if 

15: end if 

16: end for 


We say that the step length a computed in (5.15) has a blocking constraint, 
j gI, if 


And we say that a has no blocking constraints when 


d; = D;Ax 
D;Ax f 


a=1< min 
igI 
D;Ax<0 


Interior-Point Methods 


For notational convenience and without loss of generality we assume that the 
problem of interest is in standard form (5.2). 

As in the linear programming case (Section 2.7.3), interior-point methods 
generate a sequence of iterates that satisfy x,s > 0. Each iteration of the 
algorithm aims to make progress towards satisfying —-Qx+ATy+s = c, Ax = b, 
and z;s; = 0, with i =1,...,n. 

As before we use the following notational convention: Given a vector x € R”, 
let X € R”*” denote the diagonal matrix defined by X;; = x;, with i = 1,...,n, 
and let 1 € R” denote the vector whose components are all 1s. The optimality 
conditions (5.8) can be restated as 


—-Qx+Aly+s-—c 0 
Ax—b = |0|, x,s2>0. 
XS1 0 
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Given p > 0, let (x(u), y(u),s(u)) be the solution to the following perturbed 
version of the above optimality conditions: 


—-Qx+Aly+s-—c 0 
Ax—b =|]0], x,s>0. 
XS1 pl 


The first condition above can be written as r,,(x, y, s) = 0 for the residual vector 


—-Qx+Aly+s—c 
Tr, (x, y, s) := Ax—b 
XS1- pl 


The central path is the set { (x(u), y(u), s(u)) : u > 0}. It is intuitively clear that 
(x(u), y(u),s(u)) converges to an optimal solution to both (5.2) and its dual. 
This suggests the following algorithmic strategy. Suppose (x,y,s) is “near” 
(x(u), y(u),s(u)) for some u > 0. Use (x,y,s) to move to a better point 
(xt, yt,s+) “near” (x(u*),y(ut),s(u*)) for some p* < p. 

It can be shown that if a point (x,y,s) is on the central path, then the 
corresponding value of u satisfies x's = nu. Likewise, given x,s > 0, define 


x's 


Fa 
To move from a current point (x, y,s) to a new point, we use the so-called 
Newton step; that is, the solution to the system of equations 


-Q A! I] fAx c+Qx-—Aly-—s 
A 0o 0| {Ay} = b— Ax ‘ (5.16) 
S 0 Xj [As ul — XS1 


Algorithm 5.2 presents a template for an interior-point method. 


Algorithm 5.2 Interior-point method for quadratic programming 
1: choose x°,s° > 0 
2: for k = 0,1,... do 
3: solve the Newton system (5.16) for (x,y,s) = (x*,y*,s*) and u := 


0.1u(x",s*) 

4: choose a step length a € (0, 1] and set (x*t!, y*+1, sk+1) = (x* y%, st) + 
a(Ax, Ay, As) 

5: end for 


The step length a in step 4 should be chosen so that x*t+1,s*t! > 0 and 
the size of r,,(x*t1, y**1,s**1) is sufficiently smaller than r,,(x*,y*,s*). A line- 
search procedure such as the one described in Algorithm 2.4 in Chapter 2 can 
be used for choosing the step length a. 
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Applications to Machine Learning 


We next discuss some iconic applications of quadratic programming to machine 
learning. We must note that the literature on optimization models in machine 
learning is vast and continues to grow at a rapid pace. For a more detailed 
discussion on this timely subject, we refer the reader to the excellent textbooks 
by Friedman et al. (2001), Sra et al. (2012), and Vapnik (2013). 


Binary Classification and Support Vector Machines 


Classification problems constitute an important class of problems in financial 
mathematics that can be solved using optimization models and techniques. In a 
classification problem we have a vector of features describing an entity and the 
goal is to analyze these features to determine which class each entity belongs to, 
among two (or more) classes. For example, the classes might be “growth stocks” 
and “value stocks”, and the entities (stocks) may be described by a feature vector 
that contains elements such as stock price, price-earnings ratio, growth rate for 
the previous periods, growth estimates, etc. 

Mathematical approaches to classification often start with a training exercise. 
One is supplied with a list of entities, their feature vectors, and the classes they 
belong to. From this information, one tries to extract a mathematical structure 
for the entity classes so that additional entities can be classified using this 
mathematical structure and their feature vectors. For two-class classification, 
a hyperplane is probably the simplest mathematical structure that can be used 
to separate the feature vectors of these two different classes. Of course, there 
may not be any hyperplane that separates two sets of vectors. When such a 
hyperplane exists, we say that the two sets can be linearly separated. 

Consider feature vectors a; € R” for i = 1,...,k, corresponding to class 1, 
and vectors b; € R” for i = 1,...,k2 corresponding to class 2. If these two vector 
sets can be linearly separated, a hyperplane w'x = y exists with w € R",y € R 
such that 


wla; > y, fori=1,...,k1 
w'b; <7, for i= 1,..., k2. 


To have a “strict” separation, we often prefer to obtain w and y such that 


wla; > y+1, for i= 1,..., ki 
w'b; <y-—1, fori = 1,..., ko. 


In this manner, we find two parallel lines (w! x = y + 1 and w'x = y — 1) that 


form the boundaries of the class 1 and class 2 portions of the vector space; see 
Figure 5.6. 

There may be several such parallel lines that separate the two classes. Which 
one should one choose? A good criterion is to choose the lines that have the 
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Strict separation 
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Figure 5.6 Linear separation of two classes of data points 


largest margin (distance between the lines). In machine learning, this type of 
classification model is known as a support vector machine (Friedman et al., 2001; 
Vapnik, 2013). 


(i) Consider the following quadratic problem: 


min ||w||3 
w,y 


alw > y+1, fori=1,...,kı (5.17) 
blw < y-1, fori=1,...,ke. 


The objective function of this problem is equivalent to maximizing the 


margin between the lines w'x = y + 1 and wx = y — 1 (see Exercise 5.6). 


(ii) The linear separation idea we presented above can be used even when 
the two vector sets {a;} and {b;} are not linearly separable. (Note that 
linearly inseparable sets will result in an infeasible problem in formulation 
(5.17).) This is achieved by introducing a non-negative violation variable 
for each constraint of (5.17). Then, one has two objectives: to minimize 
the total of the constraint violations and to maximize the margin. One 
can formulate a quadratic programming model that combines these two 
objectives using an adjustable parameter that can be chosen in a way to 
put more weight on violations or margin, depending on one’s preference (see 
Exercise 5.7). 
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Ridge and Lasso Regression 


Recall the regression problem described in Example 5.3, namely to estimate the 
linear model 


Y =B'X +e, 


where X and Y are R?-valued and R-valued random variables, by using some 
training data (Xi, yi), with i = 1,..., N. 

We next discuss the case when N < p. This case poses a classical and modern 
challenge in data science. Indeed, this kind of case is increasingly common 
as modern technology facilitates the collection of data. The expression high- 
dimensional problems in the data science literature (Friedman et al., 2001) is 
often used to describe problems where p > N. Examples of high-dimensional 
problems abound in computational biology and genomics, and other instances 
will likely emerge. In those contexts N corresponds to the number of individuals, 
e.g., patients, in some study. Due to physical limitations, N may only be of 
the order of a few hundred. In contrast, the number of features p that can be 
gathered, e.g., gene measurements, could be of the order of tens of thousands. 

When p < N the p x p matrix XTX has rank at most N < p and thus the 
least-squares approach 


min |X" — yl 


is inadequate because the optimality conditions lead to an underdetermined 
system of equations 


(X'X)B=X'y. 


We next describe two popular modifications to the ordinary least-squares 
approach that aim to rectify this difficulty, namely ridge regression and lasso 
regression. 

Ridge regression adds a quadratic penalty term to the objective function in 
the least-squares model 


mit IX" — yll? + AIBIl2, (5.18) 


where A > 0 is a tuning parameter. The effect of the penalty term is to shrink the 
regression coefficients towards zero. The magnitude of A determines the shrinking 
effect. In the limit when  — oo the solution to the ridge regression model is 
B = 0. On the other hand, when A = 0 ridge regression and ordinary least 
squares coincide. 

The optimality conditions for (5.18) yield the following system of equations: 


(X'X +ADB-X'y =0. 
Thus the solution to (5.18) is 


B=(X'X+AI)'X'y. 
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On the other hand, the lasso regression model, proposed in a seminal paper 
by Tibshirani (1996), adds a 1-norm penalty term to the objective function in 
the least-squares model 


min IX" = yll? + Allla; (5.19) 


where A > 0 is a tuning parameter. The effect of the penalty term is again to 
shrink the regression coefficients towards zero. However, the properties of the 
l-norm have a far more interesting effect. The penalty term A||G||,; makes some 
of the regression coefficients be equal to zero. In particular, the solutions to the 
lasso regression model (5.19) are typically sparse and the level of sparsity is 
controlled by the tuning parameter A. Lasso regression can be formulated as a 
quadratic program (see Exercise 5.9). Unlike ridge regression, there is no closed- 
form formula for the solution to lasso regression. 


Exercises 


Exercise 5.1 Assume c € R” and Q € R”*” is symmetric. Show that the 
function 


f(x) = $x'Qx+c'x 


is convex if and only if Q is positive semidefinite. 
Assume c € R” and Q € R”*” is symmetric and positive semidefinite but not 
positive definite. Show that the problem 


+ 17 T 
min 5x Qx+c x 
is either bounded or has infinitely many optimal solutions. 


Exercise 5.2 Let c € R”. Show that the solution to 


min $||x||}-—c'x 
x 
ste. deat 
x>0 
is 
x=(Al+c)*, 


where  € R is a suitable threshold value such that 1'(A1 + c¢)+ = 1. 


Exercise 5.3 Let c,d € R”. Assume d > 0 and D = (Diag(d)). Show that the 


solution to 
min 5x'D-!x —c'x 
x 
st. 1'x=1 
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is 
x = (Ad + De)", 
where  € R is a suitable threshold value such that 1'(Ad + De)* = 1. 


Exercise 5.4 Write a CVX MATLAB script that takes as inputs c € R”, Q € 
R'*~", AE R™*”", b € R” and solves the optimization problem 

min 3x'Qx +c'x 

st. Ax=b 

x > 0. 

Test your script on instances generated as follows: 
>> m=1, n=5, c=randn(n,1), Q=eye(n), A=ones(m,n), b=1; 
and 
>> m=1, n=5, c=randn(n,1), Q=diag(rand(n,1)), A=ones(m,n), b=1; 
Are the results consistent with Exercises 5.2 and 5.3? 
Exercise 5.5 Consider a quadratic program with non-negativity inequality 
constraints only: 

min ixTQx +c'x 


x 5.20 
s.t. x>0. ( ) 


There is some intuition behind the optimality conditions: at an optimal solution 
the non-negativity constraints split into binding and non-binding constraints. 
The former behave like equality constraints whereas the latter can be treated as 
if they did not exist. Suppose I C {1,...,n} is the set of binding constraints and 
J = {1,...,n} \ Z. This reasoning suggests that we think of the problem 


so iT T 
min 3X Qx+c'x 
s.t. xr = 0, 


whose optimality conditions are 


(Qx+c) -sr = 0 
(Qx+e); = 0 (5.21) 
X; = 0. 


Prove that this intuition is indeed correct. 
Exercise 5.6 Consider the quadratic problem (5.17) presented in Section 5.6.1. 


(a) Show that the objective function of this problem is equivalent to maximizing 
the margin between the lines w'x = y + 1 and wlx = 7-1. 
(b) Write the optimality conditions for problem (5.17). 


(c) Write the dual. 
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Exercise 5.7 The linear separation idea presented in Section 5.6.1 can be used 
even when the two vector sets {a;} and {b;} are not linearly separable. This 
is achieved by introducing a non-negative violation variable for each constraint 
of (5.17). Then, one has two objectives: to minimize the total of the constraint 
violations and to maximize the margin. Develop a quadratic programming model 
that combines these two objectives using an adjustable parameter that can be 
chosen in a way to put more weight on violations or margin, depending on one’s 
preference. 


Exercise 5.8 The classification problems discussed in the two previous exer- 
cises can also be formulated as linear programming problems, if one agrees to 
use the l-norm rather than the 2-norm of w in the objective function. Recall 
that ||w||; = 5°, |w;|. Show that if we replace ||w||5 by ||w/|, in the objective 
function of (5.17), we can write the resulting problem as a linear program. Show 
also that this new objective function is equivalent to maximizing the distance 


T 


between w! x = y + 1 and w'x = y — 1 if one measures the distance using the 


oo-norm ||g||.. = max; |g:l. 


Exercise 5.9 Show that the lasso regression model (5.19) can be equivalently 
formulated as 
min ixTQx +c'x 


s.t. Dx>d 


for some suitable Q, c, D, d. 
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Quadratic Programming Models: 
Mean—Variance Optimization 


Portfolio Return 


Consider an investment environment where there is a universe of n risky assets. 
In the next few chapters we will be concerned with a one-period model of the 
problem of investing in these n risky assets. Assume a portfolio must be selected 


T 
at some initial time tọ and held until time t. Let vo = [vo ee Uno] and v = 


[v e Un] denote the vectors of asset prices at times to and t respectively. 
The vector vo is known whereas v is a vector of random variables. A vector 
h € R” of share holdings in each of the assets defines a portfolio whose values at 
time to and t are Wo := vgh and W := v'h respectively. The value Wọ is known 
at time tọ whereas W is a random variable. The gist of portfolio construction is 
to choose h to optimize some measure of satisfaction on the random variable W. 
It is customary to use the initial portfolio value Wo as a reference and to write 
the above problem in terms of the portfolio return 
 W-Wo 
rp = -m A 
The return of asset 7, which is the same as that of a portfolio entirely invested 
in asset 2, is similarly defined as 
pg Vi — vio 
Vi,0 
Instead of the vector of holdings h € R”, the portfolio construction problem is 
often stated in terms of percentage holdings x € R” where 


hivi o hivi o 
Ti = 


Wo jaa 525.0 


Observe that W = v'h can be equivalently written as 


In spite of its wide popularity, this convention runs into difficulties in some 
cases. For example, the above quantity rp does not make sense for a long-short 
portfolio associated with a pairs trading strategy. More broadly, the quantity rp 
does not make sense for a situation where the initial value of a portfolio Wọ is 
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zero aS when one enters a futures contract or constructs a long-short portfolio 
with equal long and short cash positions. 

As Meucci (2005, 2010) nicely puts it, this difficulty can be amended by 
assuming that returns are measured relative to some predefined basis value b as 
opposed to the initial portfolio value Wo. In some cases, it is natural to choose 
b = Wo but it is more proper to think of b as a general reference point. To make 
this idea more precise, we associate with each asset and portfolio a basis b that 
satisfies the following four properties: 


e The basis b for a long position of an asset is positive. 
e The basis b is measured in the same unit as the asset values. 


e The basis is homogeneous: the basis of k shares of an asset is k times the basis 
of one share. 


e The basis is known at time to. 


Equipped with this concept, we get a formal and unambiguous definition of asset 
and portfolio returns: 


Likewise, we obtain a formal and unambiguous definition of percentage holdings: 


habi 


Ti = 


bp ` 
Once again, the identity W = v'h can be equivalently written as 
rp 5T X. 


Throughout this chapter x = [z1 ee En F will denote the vector of percent- 
age holdings of a portfolio in a universe of n risky assets. When it is applicable 
and evident from the context, we shall assume the usual basis values b; = v;o 
and bp = Wo respectively. 


Markowitz Mean-—Variance (Basic Model) 


Markowitz’s key insight into the above one-period investment problem was to 
consider the expected value and standard deviation of the return as measures of 
performance and risk respectively. The portfolio selection problem can then be 
formally stated as a quadratic programming model. To simplify our discussion 
of this model, we will proceed in three incremental steps. First, we will look at 
the case when there are only two assets; second, we will look at the case when 
there are three risky assets; and finally, we will see the general case with any 
number of risky assets. 
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Two Assets 


Suppose we are combining two assets whose random returns are rı and r2. Let 
p :=E(ri), m2 :=E(ra), 
and 
o? :=var(r1), o2 =var(re), o12 = cov(r1,T2) = p 01° 02. 
In this case a portfolio of these two assets is determined by the proportion 


invested in one of the two assets. Let x denote the proportion in asset 1. Thus 
the portfolio return is 


rp =g: rı + (l-z): rə, 
the portfolio expected return is 


up :=E(rp) = «-E(r1)+(1—2)-E(ro) 
= £- u +(1—2)- pe, 


and the portfolio variance is 


ob =a7o7 + (1—2)°0$ +2-2(1—2)-p-o1- 02. 


In the special case when one of the assets, say asset 2, is the asset with risk-free 
return rf we get 


up =x: m+ (1x) rf =rf+ (mrj), op = 2707. 
In this case the portfolio selection is particularly simple: a target level of expected 
return up corresponds to one particular portfolio obtained by choosing x = 
(we —rf)/(uı — rf). The situation with three assets leads to a more interesting 
situation. 


Three Risky Assets 


Suppose now that there are three assets with random returns r1,r2, and r3. As 
before, let 


j =E(r;), oF := var(r;) forj = 1,2,3, 


and 
Gij = COV(T:, rj) = Pig oi oj + for i,j = 1,2,3. 
Now a portfolio determines the holdings in the three assets. Let x; denote 
the proportion (weight) invested in asset j, for j = 1,2,3. Notice that these 


proportions should add up to one if the portfolio is fully invested in the three 
assets: 


Tı + T2 + z3 = 1. 
Similar to what we did before, the portfolio return is 


Tp = T1£1 +T2£2 +7303. 
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So the portfolio expected return is 
HP = M121 + H2£2 + H323, 
and the portfolio variance is 
Oo} = Ole? bots + Onn + 2(012£1£2 + O23L2£3 + 0134123). 


Observe that now there are multiple portfolios that can achieve a target expected 
level of return. A portfolio is efficient if it has minimum risk for a given target 
return, or equivalently, if it has the maximum expected return for a given target 
risk. This naturally leads to the following quadratic programming formulation. 

To find a portfolio of minimum risk (variance) with expected return at least fi 
solve the following mean-variance optimization model: 


min Yount >} 5 Dizi; 


i=1 j=i+1 
S.t. Hızı + H2%2 + H3T3 > fh 
£1 + £2 + £3 = 1. 
The efficient frontier is the set of efficient portfolios. The efficient frontier is often 
“visualized” by plotting the expected return against the standard deviation of 
the efficient portfolios. To generate portfolios on the efficient frontier, we can 
minimize variance, for varying target return j: 


min Z ouat jas 3 jj LiL; 


i=1 j=i+1 
S.t. M1£ı + H2%2 + H3T3 > ji 
£1 + £2 + £3 = 1. 


We can also maximize return, for varying target variance a? > 0: 
max of + U2£2 + A 


s.t. Sout >} D Oj; LiL; <a 


t=1 j=i+1 
ies ee 


Or we can maximize quadratic utility, for varying risk aversion y > 0: 


maz M121 +para + pata- F(Z oust 2 3 aytz) 


i=l j=i4+1 
s.t. £1 + T2 + T3 = 1. 


Any Number of Risky Assets 


Let us now take a leap to the most general case. Assume we have n risky assets. 
Let r € R” be the n-dimensional random vector of returns, i.e., r; denotes the 
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return of asset 7 between times to and t. Let yz € R” denote the vector of expected 
returns, and V € R”*” denote the return covariance matrix. More precisely, 


Hı O11 > Oln 
H= > V= , 
Hn Oni t Onn 
where u; := E(r;), oj; := cov(ri, rj), i,j =1,...,n. 
From the linearity properties of expectation, it follows that the expected return 
T 
and variance of a given portfolio x = [x1 ee Tn] of the risky assets are 
respectively 


n 
E 
WET D Pyry 
j=1 
and 
n n n n n 
x' Vx = No 5 OijLitj = X Tur? + 25 Xo OijLitj. 
i=1 j=1 i=1 i=1 j=i+1 

The problem of selecting a portfolio can be formally stated as a tradeoff between 
these two components. A fully invested portfolio is efficient if it has minimum 
risk for a given level of return, or equivalently if it has maximum expected return 
for a given level of risk. 


A fully invested efficient portfolio can then be characterized as the solution to 


the following quadratic program: 
max pix — iy -x' Vx (6.1) 
1'x=1 l 


for some risk-aversion coefficient y > 0. 
The set of efficient portfolios can also be obtained as the set of solutions to 
the quadratic program: 


min x'Vx 
x 


s.t. ele Sp (6.2) 
I’x=1, 
and also as the set of solutions to 
max p'x 
x 
s.t. x! Vx <a? (6.3) 
x= 


by varying j and o respectively. The exercises at the end of the chapter sketch 
how to give a formal proof of the equivalence of the above three models. 

We shall refer to the equivalent mean-variance models (6.1), (6.2), and (6.3) as 
the basic mean-variance models as they include only the following three essential 
components: mean and variance of return, and the full investment constraint. 
Observe that these three optimization models are convex because the quadratic 
function x +> x'Vx is convex as the covariance matrix V is positive semidefinite. 
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Section 6.3 below details several interesting insights that can be gained from the 
solution to these basic mean-variance models. 

As we discuss later in this chapter, the types of mean-variance models used 
in portfolio construction typically include a number of additional constraints. 


Asset Allocation and Security Selection 


There are two distinct levels of portfolio analysis that are amenable to mean- 
variance models. The conventional top-down investment approach to portfolio 
construction consists of two main steps, namely asset allocation and security 
selection. 

On the one hand, the asset allocation decision is concerned with portfolio 
choices among broad asset classes. At the coarsest level, these asset classes could 
be stocks, bonds, and cash. At a more refined level, some of these broad asset 
classes could be subdivided. For instance, stocks can be divided according to 
geography or market capitalization. The asset allocation decision involves only 
a small number of assets, typically ranging from a handful to a dozen or so. It 
generally involves simple constraints such as budget constraints and upper and 
lower bounds on individual positions. 

On the other hand, the security selection decision is concerned with the specific 
securities within each particular asset class. For instance, if the relevant asset 
class is equities in the S&P 500 market index, then the security selection problem 
is concerned with the specific portfolio holdings at the individual stock level. 
The security selection problem typically involves a large number of securities, 
ranging from a few hundred to potentially thousands. It also involves a myriad 
of constraints and is often formulated relative to a predefined benchmark, as we 
discuss in more detail in Section 6.5. 


Analytical Solutions to Basic Mean—Variance Models 


The solution to the basic mean-variance models described in Section 6.2 can 
be characterized by relying on the tools introduced in Chapter 5. Throughout 
this section we assume that the covariance matrix of asset returns V is positive 
definite. In particular, V~' exists. 


Minimum Risk and Characteristic Portfolios 


Consider the simplified version of (6.1) that is obtained in the limit when y > oo: 


min x!Vx 
° 1'x=1 ma 
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The model (6.4) corresponds to the problem of finding the minimum-risk fully 
invested portfolio. We discussed this problem in Example 5.7 where the optimal 
solution was shown to be 

1 

*= ey}, 
* = ivi 
A related problem that is often of interest is to find the minimum -risk portfolio 
with unit exposure to a vector of attributes a associated with the assets. As 
we will see later, some interesting attributes could be the betas of the assets 
relative to a benchmark, the asset volatilities, or the asset expected returns. The 
characteristic portfolio of a vector of attributes a is the solution to the problem 
min x!'Vx (6.5) 
x 

a'x=1. 
Using the solution of (5.9) obtained in Chapter 5, it follows that the solution to 
(6.5) is 

1 

x*=—— V'a. 
a'V-la 

Observe that a characteristic portfolio x* = (1/a'V~1a)V~ ‘a is not necessarily 
fully invested as its components may not necessarily add up to one. Observe that 
the variance of the characteristic porfolio x* = (1/a'V~1a)V~!a is 


1 
*\T x 
(x*) Vx* = ayia 


Two-Fund Separation Theorem 


Consider the basic mean-variance model 
T 1 T 
max p X- 37X Vx 


6.6 
1'x=1 ee) 


for some risk-aversion coefficient y > 0. We next derive an interesting result 
often called the two-fund separation theorem. The theorem states that every fully 
invested efficient portfolio is a combination of two particular efficient portfolios. 
Applying the optimality conditions (5.10) from Theorem 5.6 to problem (6.6) 

we obtain the solution 
1 


x* =}. myy b(L=2) 


1 
1'V-!1 


vil 
where \ = 1'V~'p/y. The following two-fund theorem readily follows. 


Theorem 6.1 (Two-fund theorem) Consider model (6.6) for some y > 0. There 
exist two efficient portfolios (funds), namely 


1 
iwa,Y and Va 
Ty 


1TV-11 
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such that every efficient portfolio, that is, every solution to (6.6), is a combina- 
tion of these two portfolios. 


Observe that one of the two portfolios in the two-fund theorem is the minimum- 
risk portfolio (1/1'V~11)V~11 and the other one is a multiple of the charac- 
teristic portfolio (1/u'V~!w)V~‘yp of the vector of attributes p. 


One-Fund Separation Theorem 


We next derive the one-fund or mutual fund separation theorem. This result is 
similar in spirit to the two-fund separation theorem. It states that if there is a 
risk-free asset, then every efficient portfolio is a combination of the risk-free asset 
and a particular fund. 

Consider the case when, in addition to the universe of n risky assets, there 
is an additional asset n + 1 with risk-free return rf. In this case, problem (6.1) 
extends as follows 

max polx+ rp Eni — iy -x' Vx 


X, n41 (6.7) 
11x + pagal, 


By substituting 2,4, = 1—1'x in the objective and dropping the constraint, 
problem (6.7) can be rewritten as the following unconstrained optimization 
problem: 
max (u — r¢1)'x — iy- x' Vx. 
x 
Applying the optimality conditions (5.4) from Theorem 5.2, we obtain the fol- 
lowing solution to (6.7): 


1 1 
*— = .Vil(p—ryl)=d- 
EG MS E A al) 


Vow rp), thy =1- 1'x*, 


where À = 1' V~! (u — rf1)/y. The following one-fund theorem readily follows. 


Theorem 6.2 (One-fund theorem) Suppose the investment universe includes 
n risky assets and a risk-free asset. Then there exists a fully invested efficient 
portfolio (fund) namely 


1 


vt 1 
wor or) 


such that every efficient portfolio — that is, every solution to (6.7) for some y > 0 
— is a combination of this portfolio and the risk-free asset. 


The portfolio [1/11 V-t (p —ry1)]V~! (u — rfl) is called the tangency port- 
folio. This name is motivated by the geometric interpretation illustrated in 
Figure 6.1. Consider the plot of expected return versus standard deviation for 
the efficient frontier portfolios. The portfolio [1/1'V~1(p — rf1)|V~ 1 (w— rfl) 
lies exactly at the tangency point on this frontier defined by the straight line 
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emerging from the point (0,ry). The point (0,rf) corresponds to the expected 
return versus standard deviation of the risk-free asset. The tangency line is also 
known as the capital allocation line (CAL) as it corresponds to portfolios with 
different allocations of capital between the tangency portfolio and the risk-free 
asset. 


Mean 


tangency line 


tangency portfolio 


Standard deviation 


Figure 6.1 Tangency portfolio 


Capital Asset Pricing Model (CAPM) 


Under suitable equilibrium assumptions the tangency portfolio discussed above 
yields the main mathematical foundation for the capital asset pricing model 
(CAPM), a fundamental asset pricing model in financial economics. The key 
step in this derivation is that, in equilibrium, the tangency portfolio is precisely 
the market portfolio xm. That is, 


1 
TVu- yl) 


XM V-'(u—ryl). (6.8) 


From (6.8) we readily obtain 


1 
Vxy = 1 : 
XM IV (p-r) (u Tf ), (6.9) 
and 
—rpl)' xy 1- 
x) Vxu = (u i ) ZM = HM nf (6.10) 


1'V-!(u—r1) ITV (pe — rp)’ 
where um = pp Xqz is the expected value of the market portfolio return. 


Combining (6.9) and (6.10) we get 


Vx) (um —rs)=B-(um—Ts), 
(6.11) 


p—rpl =1'V~!(u-r1)Vxm = CE 
f ( f ) xl, Vxm 
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where 3 = (1/x},Vxi)Vxm. The above can be equivalently stated as 


on f 
Hj — rf = pilum — rs), where 8j = ae for 7 =1,...,n. (6.12) 
M 


Equation (6.11) or its equivalent (6.12) is the formal statement of the capital 
asset pricing model (CAPM). The CAPM postulates that the excess return of 
asset j is determined entirely by its beta coefficient times the excess return of 
the market. 

In the expression (6.12), ajm denotes the covariance between the return of 
asset j and the return of the market portfolio, and 0%, denotes the variance of 
the market portfolio return. The last two quantities in turn have the following 
expressions in terms of the covariance matrix V: 


ojm = cov(r;,rm) = (Vxm);, Or = var(ryy) = xp Vxm. 


More General Mean-—Variance Models 


The basic mean-variance model discussed in the previous section provides the 
foundation of modern portfolio theory. However, when mean-variance models 
are used as a normative tool in portfolio construction, it is common to use 
modifications of the basic model by including additional constraints and possibly 
additional terms in the objective. 


Common Constraints 


Aside from a target expected return or a target variance, the only portfolio 
constraint in the basic mean-variance model is the full investment constraint 


x=1. 


Furthermore, this constraint disappears if the portfolio is allowed to include 
holdings in a risk-free asset. In both cases the individual portfolio holdings could 
in principle take arbitrary positive and negative values as there is no explicit 
restriction on them. This motivates the following types of constraints that are 
often included in a mean-variance model: 


e Budget constraints, such as fully invested portfolios. 

e Upper and/or lower bounds on the size of individual positions. 

e Upper and/or lower bounds on exposure to industries or sectors. 
e Leverage constraints such as long-only, or 130/30 constraints. 

e Turnover constraints. 


The above types of constraints replace the single portfolio constraint 


x=1 
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by a more elaborate set of constraints of the form 


Ax=b 
Dx > d. 


Consequently, we get the following general version of the basic mean-variance 
model (6.1): 


max pix — iy -x! Vx 
Ax=b (6.13) 
Dx > d. 


The set of portfolios obtained via the model (6.13) can also be obtained via the 
following two equivalent models. The first one enforces a target expected return: 


min x'Vx 
x 


st. pix > pi (6.14) 
Ax=b 
Dx > d. 


The second one enforces a target variance of return: 
max p!x 
x 
s.t. x! Vx <6? 


Ax=b 
Dx > d. 


(6.15) 


The models (6.13), (6.14), and (6.15) are still convex quadratic optimization 
models. Unlike the basic mean-variance model, they generally do not have an 
analytical closed-form solution due to the additional inequality constraints. How- 
ever, they can be solved numerically very efficiently via optimization solvers. 

We next discuss how some of the above five types of constraints can be 
incorporated into a mean-variance model. The first three types of constraints 
have straightforward formulations. We concentrate on the last two, namely, 
leverage constraints and turnover constraints. A long-only constraint can readily 
be enforced via x > 0. A relaxed version of this constraint, popular in certain 
contexts, is not to rule out leverage altogether but to limit it. For instance, a 
“130/30” leverage constraint means that the total value of the holdings in short 
positions must be at most 30% of the portfolio value. In general, suppose that 
we want the value of the total short positions to be at most L. This means that 
we want to enforce the following restriction: 


X min(x;, 0) >-L = S| max(—2;,0) <L. 
j=1 j=1 


Although this is a correct mathematical formulation of the constraint, it is not 
ideal for computational purposes because of the non-smooth terms max(—1,, 0). 
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In particular, if a constraint were written in this form the resulting mean- 
variance model would not be a quadratic program. To formulate this constraint 
efficiently in the quadratic optimization model, we trade terms of the form 
max(—2,;,0) for new terms involving possibly new variables and linear inequal- 


= 
ities. To that end, add the new vector of variables y = [y ee Yn] and 
constraints 
x2-y 
n 
dou Sb 
j=l 
y > 0. 


A turnover constraint is a constraint on the total change in the portfolio 
positions. This constraint is generally included as a way to limit certain kinds 
of costs such as taxes and transaction costs. Suppose that we have an initial 
portfolio x° = [2) --- xo)" and we want to ensure that the new portfolio 


incurs a total turnover no larger than h. This means that we want to enforce the 
restriction 


n 
5 |x} —aj|\<h. 
j=l 


To formulate this constraint efficiently in the quadratic optimization model, add 


k 
the new vector of variables y = [y EE Yn] and constraints 


n 
(see Exercise 6.3). The total turnover 5 |z? —x,| is also sometimes called the 
j=l 
two-sided turnover. 


Maximizing the Sharpe Ratio 


The three equivalent mean-variance models (6.13), (6.14), and (6.15) define a 
frontier of efficient portfolios. These portfolios are determined by some optimal 
tradeoff of expected return and variance, or equivalently, standard deviation of 
return. The ratio of expected return to standard deviation, called Sharpe ratio 
or reward-to-risk ratio, singles out the efficient portfolio that offers the highest 
reward per measure of risk. 
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Definition 6.3 (Sharpe ratio) The Sharpe ratio of a given portfolio x = 


[x1 EE Tn] is the ratio of its expected return to its volatility (standard 
deviation) of return: 


pix 


Sharpe ratio := ) 
x' Vx 


As we further elaborate in the next sections, sometimes u may not necessarily 
stand for the vector of expected absolute returns but instead it may make sense 
for u to stand for the vector of expected relative returns. In particular, if there is 
a risk-free asset, in the above definition of the Sharpe ratio it is usual to assume 
that ps stands for the vector of expected excess returns. The excess return of an 
asset is simply the difference of its return and the risk-free return. 

As an alternative or a complement to the equivalent mean-variance mod- 
els (6.13), (6.14), and (6.15), consider the problem of finding the efficient portfolio 
with maximum Sharpe ratio. The natural formulation for this problem is the 


following: 
Tx 
max —— 
Ba V x' Vx (6 16) 
s.t. Ax=b 7 
Dx > d. 


This natural formulation is evidently not a quadratic optimization model. Fur- 
thermore, the formulation is not convex as the objective function is not convex. 
We next show that this problem can be recast as a quadratic convex optimization 
problem via a suitable homogenization. To this end, make the following mild 
assumptions: 


e There is a feasible portfolio x such that u'x > 0. 


e The matrices A, D and vector ps satisfy the following technical condition: 
Az=0,Dz>0 > u'z <0. 


The latter condition readily holds when the following stronger but easier 
to verify condition holds: 


Az=0,Dz>0 > z=0. 


The above assumptions ensure the soundness of the approach described next. To 
see what goes wrong when these assumptions do not hold, see the exercises at 
the end of the chapter. 

The gist of the reformulation of (6.16) as a quadratic optimization problem 
is the following homogenization. Consider the change of variables obtained by 
putting z := «x, where « > 0 is a new scalar variable. The problem (6.16) can 
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be rewritten as 


4 


a n 
max —— 
zke yz Vz 
Z 
DŽ >d 
K 
K > 0. 


The assumption u'x > 0 for some feasible x implies that we can choose «x > 0 
such that u'z = 1. Using this together with the second assumption, it follows 
that the problem (6.17) is equivalent to 


min z'Vz 


st. plz =p 
Az—bs = 0 (6.18) 
Dz—dk > 0 
K > 0. 


As the exercises at the end of the chapter detail, this approach also yields the 
following characterization of the portfolio with maximum Sharpe ratio in the 
case when we only include the full investment constraint 1'x = 1. 


Proposition 6.4 Suppose the minimum-risk portfolio (1/1'V—!1)V—!1 has 
positive expected return; that is, u'V~!1 > 0. Then the solution to the following 
maximum Sharpe ratio problem 


px 
x Vx Vx (6.19) 


s.t. I'x=1 


is the tangency portfolio 


Portfolio Management Relative to a Benchmark 


In an investment portfolio, the security selection problem is concerned with deter- 
mining the holdings of specific securities within a given asset class. It is customary 
to manage and evaluate the portfolio of securities relative to some predefined 
benchmark portfolio that represents a particular asset class. The benchmark 
portfolio provides a reference point. It serves the role of the market portfolio 
if the investment universe is restricted to the particular asset class that the 
benchmark represents. The management of a portfolio of securities relative to a 
benchmark could be passive or active. The goal of the former is to replicate the 
benchmark whereas the goal of the latter is to beat the benchmark. 
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Systematic (Beta) and Individual (Alpha) Returns 


Both passive and active management rely on a fundamental decomposition of 
individual securities return into systematic and individual (or residual) compo- 
nents. The former is the component of return that can be explained by the 
security exposure to the benchmark. The latter is the component of return that 
is idiosyncratic to the individual security. 

To make the above decomposition more precise, assume the investment uni- 
verse determined by a particular asset class includes n individual securities. Let 
ri denote the excess return of security i for i = 1,...,n. Let rg denote the excess 
return of the benchmark. 

The return of security i can be decomposed via the following linear regression 
model: 


ri = Bre t+ bi, 


where 6; is the component of return uncorrelated to rg; that is, cov(rg, 0i) = 0. 
The coefficient 8; is the beta of security i relative to the benchmark B and is 
given by 

cov(Ti, rB) 


B= 


var(rp) 


The term irp is the systematic component of return of security i. The term 6; 
is the residual component of return of security i. The alpha of security i is the 
expected value of the residual return 6;: 


Qi = z(0;). 


$ 
Consider a portfolio of securities with percentage holdings x = [z1 e Tn] 5 
The above type of decomposition also applies to the portfolio return 


Tp := rX = rity +--+ H nTn- 
That is, we can decompose the portfolio return rp as 
rp = prg + Op, 


where the systematic and residual components of the portfolio return are respec- 
tively 


Bpre = (B'x)rp = (Bixi ++ + Bntn)rB 
and 
bp = O'x = biti ++ One. 


Furthermore, it is easy to see that the beta and alpha of the portfolio are 
respectively 


Bp = B'x = bix1 +++ + Btn 


and 


ap = E(6p) = a'x = ati +++: + Onn. 
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Active Return, Tracking Error, Information Ratio 


2 
Consider a portfolio with percentage holdings x = [z1 e Tn] . The active 
return of the portfolio is the difference between the portfolio return and the 
benchmark return: 


r'x = FH 


If the portfolio of benchmark holdings is x? = le? vee T2] , then rg = r'x? 


and thus the active return can also be written as 


T 


r'x—rg =r! (x—x?) 


The vector x — x? is the vector of active holdings of the portfolio. 
The active risk or tracking error p? of a portfolio is the standard deviation of 
the portfolio active return. In other words, 


Y? := var(r'(x — x?)). 


Some straightforward matrix calculations show that if V is the covariance matrix 
of securities returns, then 


yY? = var(r' (x — x?)) = (x — x”)! V (x — x). 


A straightforward calculation also shows that the active risk can be decomposed 
as 


y’ = (Bp = 1)°o% +wp, 


where 02, = var(rg) and wł = var(ĝp). The first term (8p — 1)?0% is the 
component of active risk due to the active beta Gp — 1 of the portfolio. The 
second term wÊ is the portfolio residual risk. Observe that the active risk and 
residual risk are the same when fp = 1. 

The information ratio is a cousin of the Sharpe ratio defined in Section 6.2. 


Definition 6.5 (Information ratio) The information ratio (IR) of a portfolio 
P is the ratio of expected residual return to volatility (standard deviation) of 


residual return: 


Tse 
wp 


Portfolio Optimization with Benchmark Considerations 


The consideration of a benchmark in portfolio construction typically leads to 
mean-variance models that include some adjustments and constraints induced 
by the benchmark. 

The following are some of the most common adjustments and constraints when 
a mean-variance model is used for portfolio construction relative to a benchmark: 


e Use expected residual returns a'x instead of expected total return Tx. 


e Use active risk Y? = (x — x?)'V(x — x®) instead of total risk x" Vx. 
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e Bounds on the size of active positions. These adjustments and constraints are 
typically of the form 


? 


that restrict the deviations between the portfolio holdings and the bench- 
mark holdings. 

e Bounds on the beta of the portfolio. Again this type of constraint is typically 
of the form 


L<ßB'x-1<V. 


As an example, the optimization problem might be 


max alx 


s.t. (x — xP) V(x -— x?) < y? 
Txat 
PEO ALE, 


(6.20) 


Estimation of Inputs to Mean—Variance Models 


The estimation of input parameters, namely the covariance matrix of returns V 
and the vector of total expected returns p or residual expected returns a, is one 
of the most critical and challenging steps in the use of mean-variance models. We 
next describe some of the central ideas that underlie most popular approaches 
to this fundamental problem. A comprehensive treatment of this subject is well 
beyond the scope of this book. Thus we only describe the key building blocks of 
factor models. We refer the reader to the textbooks of Grinold and Kahn (1999) 
and Litterman (2003) and to the articles by Rosenberg (1974) and Ledoit and 
Wolf (2003, 2004) as well as the references therein for further details on the 
vast variety of techniques and approaches that can be used for estimating the 
mean-variance input parameters V and p. 

Throughout this section assume the investment universe has n assets and let r; 
denote the excess return of asset i for i = 1,...,n. Let r € R” denote the vector 
of excess returns. A rudimentary approach to estimate and V via sample 
means and sample covariances is based on historical data. More precisely, given 
a time series of realized excess returns r(1),r(2),...,r(T), the vectors of sample 
means and sample covariance are respectively 


E ; te 
b= 7 yoy. VS aoe!) — fs) (r(t) — a)". 


The vector ù and matrix V provide estimates of u and V. However, these 
estimators have three major shortcomings: 


e The sample mean and sample covariance do not incorporate other data that 
could contain useful forecasting information. 
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e For an investment universe with n assets, there are a total of n + $n(n+1) 
= 4n(n +3) different parameters to estimate. Although this could be 
manageable for a small asset allocation model, it is not viable for an 
equity portfolio management model, as the number of securities n in a 
stock universe could easily range in the hundreds or thousands. 

e The sample mean and sample covariance inevitably contain a fair amount of 
estimation errors, which, as we further explain in the next chapter, are 
magnified by the mean-variance optimizer. 


The first two shortcomings above can be largely mitigated by assuming some 
kind of structure in the portfolio returns r, as the following subsections detail. 
The next chapter is devoted entirely to the third shortcoming. 


Single-Factor Model 


The task of estimating a risk model can be drastically simplified by assuming 
that each asset has two components of risk: market risk and residual risk. This is 
a single-factor risk model. Historically this model was introduced by Sharpe as 
an intellectual precursor of the capital asset pricing model (CAPM). The model 
assumes that excess returns are decomposed as in the following regression model: 


ri = pirm + 0i. 


Here p; is the beta of asset i, and 6; is its residual return, uncorrelated with 
rm. The model also assumes that the residual returns 6; are uncorrelated with 
each other. The rationale for the model is that a single common factor rm, 
typically the return of the market portfolio, accounts for all of the common 
shocks between pairs of assets. The parameter (3; is also called the factor loading 
or factor exposure of asset i. The component 6; is also called the residual or 
specific return of asset i, as it is the portion of r; not accounted for by the 
common factor rm. 
A bit of algebra shows that in this model the expected return of asset i is 


(ri) = biE(rm) + E(6), 


the covariance between two different assets i and 7 is 


2 
cov(ri, rj) = BiBjon, 

and the variance of asset 7 is 
2.2 2 
var(ri) = Bp ony + wi» 


where o%, = var(rm), w? = var(6;). 
Using matrix—vector notation, the single-factor risk model assumption can be 
succinctly written as 


r= Brum +0 
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and the vector of expected returns and covariance matrix can be written as 
i(r) = BE(rm) + E(@), V = 04,88" +D, 


where D is the diagonal matrix D = diag(w?,...,w2) = cov(@). 

We observe that under the single-factor model, the estimation of the covariance 
matrix only requires the estimation of 3, 0%,, and D. That is a total of n+1+n = 
2n + 1 parameters in contrast to the $n(n+ 1) parameters for a non-structured 


covariance matrix. The particular structure of the covariance matrix for a single- 
factor risk model also enables the derivation of some interesting properties of 
minimum-risk portfolios. (See the exercises at the end of the chapter.) 

A basic estimation of the parameters of a single-factor model can be performed 
as follows. Assume we have some historical data of realized returns r(1),...,r(Z) 
as well as the corresponding returns for the factor rjy(1),...,ra(T). Use these 
data to run n simple linear regressions 


Ti = Qi + birm +€ t=1,...,n. 


Each of these linear regressions yields estimates Ĝi of bi, âi of E(0;), and ô; of 
var(e;) = var(0;). Using the historical data rm (1), ...,rm(T) for the factor, we 
can also obtain an estimate 6%, of var(rm). 

The above basic regression method can be enhanced to produce more accurate 
estimates. In particular, it is known that the quality of the estimates of B can 
be improved via a shrinkage procedure as explained by Blume (1975). The basic 
idea, which can be traced back to the classical work of Stein (1956), is that 
improved estimates on 6 can be obtained by taking a convex combination of the 
raw estimates B and 1: 


(1 Ta 7)B F 71, 


for some shrinkage factor r. The articles of Ledoit and Wolf (2003, 2004) elabo- 
rate further on using shrinkage for improved estimates of the covariance matrix. 
Efron and Morris (1977) present a related and entertaining discussion of shrink- 
age estimation applied to baseball statistics. 

The estimates of om and of w; can also be improved by using techniques 
such as exponential smoothing and generalized autoregressive conditional hetero- 
skedasticity (GARCH) (Campbell et al., 1997; Engle, 1982). 

The CAPM is related to, although not the same as, a single-factor risk model. 
In the context of a single-factor model where the factor is the market portfolio 
rm, the CAPM postulates 


(ri) = BE(rm). 


In other words, the expected value of the asset-specific return is zero. The CAPM 
thus gives a straightforward estimation procedure for the vector of expected 
returns u = E(r), namely f := Bjiy, where Ê and figs are estimates of 3 and 
i(ryz) respectively. As we discuss in Section 6.6 below, other alternatives for 
estimating expected returns are often used in equity portfolio management. 
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Constant Correlation Models 


A second way of imposing structure on the asset returns is to assume that the cor- 
relation between any two different assets in the investment universe is the same. 
Under this assumption, the estimation of the covariance matrix only requires 
an estimate of each individual asset volatility o; and the average correlation p 
between different pairs of assets. This yields a “quick and dirty” estimate of the 
covariance matrix given by 


cov(r;, rj) = PO;, VG. 


In this model the estimation of the covariance matrix only requires estimates of 
o and p. That is a total of n + 1 parameters. 

Under the reasonable assumption that p > 0, the constant correlation model 
can be seen as the following kind of single-factor model with predetermined 
factor loadings. Assume the following single-factor model for volatility scaled 
excess returns: 


> =f+6, 
where f is a common factor to all scaled returns and 6; is a specific scaled 
return on asset 7. It is easy to see that this particular single-factor model yields 
a constant correlation model with p being the variance of the single factor f. 
Using matrix notation, the constant correlation covariance matrix can be 
written as 


V = poo! + (1 — p)diag(o )}?. 


A basic estimation procedure for this model is straightforward: first, using his- 
torical data, compute estimates ĉ; of o; and estimates ĵ;j of each correlation pij 
for all i 4 j. Finally, take the average 


1 F 
aoa n(n — 1) Do i 


tAj 


as an estimate of p. 


Multiple-Factor Models 


Multiple-factor models are a generalization of the single-factor model discussed 
above. These models are based on the assumption that the return of each asset 
can be explained by a small collection of common factors in addition to some 
other specific return. Aside from simplifying the estimation task, multiple-factor 
models provide a useful breakdown of risk, incorporate some economic logic, 
and are fairly flexible. The majority of quantitative money managers rely on 
multi-factor models provided by third-party vendors such as MSCI, Axioma, 
Northfield, etc. for the management of equity portfolios. 
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A multi-factor model assumes that excess returns are as follows: 
K 
i= > Bik fk + üi, 
k=1 
where 
e ri: excess return of asset i 
e Bj: exposure of asset i to factor k 


e fp: rate of return of factor k 
e u,: specific (or residual) return of asset i. 


It is convenient to rewrite the relation above in matrix form as 
r= Bf +u. 


A bit of matrix algebra shows that the expected value and covariance of r are 
respectively 


[r] = BE[f] + Efu], V = BFB' +A, 


where F = cov(f) and A = cov(u). Observe that A is diagonal since the u; are 
assumed to be uncorrelated with each other. 

The construction and estimation of a multi-factor model hinges on the choice 
of factors. For an equity universe, the following three main classes of factors are 
commonly used: 


e Macroeconomic factors: inflation, economic growth, etc. 
e Fundamental factors: earning/price, dividend yield, market cap, etc. 
e Statistical factors: principal component analysis, hidden factors. 


Empirical evidence suggests that the second type of fundamental factors works 
better than the other two (Connor, 1995). This is also the prevalent class of 
factors used by most risk model providers. In this approach we have 


r= Bf +u, 


where the matrix of factor loadings B is predetermined. The estimation of the 
corresponding covariance matrix is as follows. Using historical data for the asset 
returns, infer the corresponding historical data for factor returns by solving each 
of the weighted least-squares problems 


min(r(t) — Bf(t))"D~'(r(t) — BE (t)). 


The matrix D is a diagonal matrix whose entries are estimates of the asset 
variances. A common proxy is to use instead the reciprocal of the market capi- 
talizations of the assets. The solution to this weighted least-squares problem is 


f(t) = (BD"'B')"!B'D “!r(t). 
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Each row of the matrix (BD~!B')-!B'D~! can be interpreted as a factor 
mimicking portfolio. 

Equipped with this historical data of factor returns, we can estimate the factor 
covariance matrix. The residuals u(t) := r(t)—Bf(t) can then be used to estimate 
the covariance matrix A of asset-specific returns. 

The connection between the CAPM and single-factor models has an analogous 
counterpart in the context of multi-factor models, namely the arbitrage pricing 
theory (APT). A combination of an arbitrage argument and the assumption that 
the set of factors f account for all of the common shocks to the returns of all 
assets in the investment universe implies that 


(x) = BE(f). 


Like the CAPM, the APT model also yields a straightforward estimation proce- 
dure for u = E(r). 


Estimation of Alpha 


In a benchmark-relative context, an estimate of expected residual returns œ is 
typically the relevant estimate instead of an estimate of expected total return p. 

According to the CAPM or the more general APT model, the expected residual 
returns are zero. However, numerous articles have documented certain anomalies 
that are systematically associated with the over- and underperformance of the 
return of securities after controlling for their systematic component of return. 
Some of these anomalies include the SMB (small minus big market capitalization) 
and HML (high minus low book-to-price) factors introduced in the classical 
article by Fama and French (1992). 

A generic approach for generating alpha is to rely on signals unveiled via 
a judicious type of analysis. A signal could be an empirical observation such as 
momentum that suggests that the recent performance (good or bad) of individual 
securities will persist in the near term. A signal could also be a financial principle 
such as “firms with low book-to-price ratio will outperform” or “firms with higher 
earnings per share will outperform”. 

The following is a reasonable and popular rule of thumb for transforming a 
signal into a forecast of alpha (for a detailed discussion see Grinold and Kahn 
(1999)): 


alpha = (residual volatility) - IC - score. 


Here the residual volatility is the standard deviation of residual return. The score 
is a numerical score associated with the signal. The score is assumed to be scaled 
so that its cross-sectional mean and standard deviation are respectively 0 and 1. 
Finally, the information coefficient IC is a measure of the forecasting quality of 
the signal; that is, the correlation between the raw signal score and the residual 
return. 
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In addition to proper scaling, the signal score should be neutralized so that 
the alphas do not include biases or undesirable bets on the benchmark or on risk 
factors. As we illustrate in the exercises at the end of the chapter, neutralization 
can be achieved in various ways, as there are multiple portfolios that hedge out 
a bet on the benchmark or on other risk factors. 


Performance Analysis 


How can the performance of a portfolio manager be evaluated? Are the ex post 
results due to skill or luck? The goal of performance analysis is to answer 
these questions. The efficient market hypothesis suggests that skillful active 
management is impossible. However, there is considerable evidence against the 
efficient market hypothesis (Shleifer, 2000). 

Empirical results also suggest that an average active fund manager underper- 
forms their benchmark on a risk-adjusted basis. Furthermore, empirical evidence 
also shows that good performance does not persist: The winners this year are 
almost as likely to be winners or losers next year. These are bleak conclusions 
about asset management. So how could we tell which asset managers are the 
good ones? 

The fundamental goal of performance analysis is to separate skill from luck. 
The simplest type of performance analysis is a cross-sectional comparison of 
returns over some time period. This would distinguish winners from losers. 
However, these kinds of comparisons have several drawbacks. First, they typically 
do not represent the complete universe of investment managers but only those 
in existence during a specific time period. They generally contain survivorship 
bias. Perhaps worst of all, cross-sectional comparisons do not adjust for risk. By 
contrast, time-series analysis of returns can do a better job at separating skill 
from luck by measuring both return and risk. An even more complete picture 
can be obtained via time-series analysis of returns and portfolio holdings. 


Return-Based Performance Analysis (Basic) 


The development of the CAPM and the notion of market efficiency in the 1960s 
encouraged academics to tackle the problem of performance analysis. According 
to the CAPM, consistent exceptional returns are unlikely. Academics devised 
tests to check if the theory was correct. As a byproduct the first performance 
analysis techniques emerged. One approach, proposed by Jensen, consists of 
regressing the time series of realized portfolio excess returns against benchmark 
excess return: 


rp(t) = ap + Bprp(t) + ep(t). 


Jensen’s alpha is simply the intercept ap of this regression. According to the 
CAPM, this intercept is zero. The regression yields not only alpha and beta, 
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but t-statistics that give information about their statistical significance. The 
t-statistic for ap is 
ap 
t-stat = SFlap)’ 
As a rule of thumb, a t-statistic of 2 or more indicates that the performance of 
the portfolio is due to skill rather than luck. Assuming normality, the probability 
of observing such a large t-statistic purely by chance is smaller than 5%. 

The t-statistic and the information ratio are closely related. The main differ- 
ence between them is that the information ratio is annualized. By contrast, the 
t-statistic scales with the number of years of data. If we observe returns over a 
period of T years, the information ratio is approximately the t-statistic divided 
by the square root of the number of years of observation: 


t-stat 
JE 
The standard error of the information ratio is approximately 
= 1 
N JT 
A simple alternative to Jensen’s approach is to compare Sharpe ratios for the 
portfolio and the benchmark. A portfolio with 


IR% 


SE(IR) 


Tp > TB 

OP OB 
where 7 denotes mean excess return over the period, has demonstrated positive 
performance. Once again, the statistical significance of this relationship is rele- 
vant for distinguishing luck from skill. If we assume that the standard errors of 
the portfolio and benchmark volatilities are fairly small compared to 7 standard 
errors, then the standard error of the Sharpe ratio is approximately 1/ VN, where 
N is the number of observations. Hence a statistically significant demonstration 
of skill occurs when 


Return-Based Style Analysis 


Style analysis was developed by Nobel laureate William Sharpe (1992). The 
popularity of this concept was aided by a study (Brinson et al., 1991) concluding 
that 91.5% of the variation in returns of 82 mutual funds could be explained 
by the allocation to bills, stocks, and bonds. Later studies considering asset 
allocation across a broader range of asset classes have shown that as much as 
97% of fund returns can be explained by asset allocation alone. 

Style analysis attempts to determine the effective asset mix of a fund using 
only the time series of returns for the fund and for a number of carefully chosen 
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asset classes. Like a factor model approach, style analysis assumes that portfolio 
returns have the form 


rp(t) = wifi) + up(t), 


where the f;(t) are the returns of m benchmark asset classes. The holdings 
wj, j =1,...,m, represent the style of the portfolio. That is, the effective allo- 
cation to the m asset classes that could be replicated via a passive portfolio. The 
term up(t) represents the selection return; that is, the portion of the portfolio 
return that style cannot explain. The effective holdings can be estimated via the 
quadratic program 


min var(up(t)) 


at. Sw; =1 (6.21) 
j=l 
Wj > 0, J = 4, m 


Notice that there are two key differences between this model and conventional 


multiple regression. First, the weights are constrained to be non-negative and to 
T 


add up to 1. Second, instead of minimizing the sum of squared errors 5 up(t)?, 
t=1 
we minimize the variance of these quantities. The reason for the first restriction is 


that the w; are to be interpreted as an effective asset allocation representing the 
style of the fund. In essence, they create a fund-specific benchmark. The reason 
for the second restriction is that we want to allow for a non-zero selection effect 
by the fund manager. The model finds the style that minimizes the variance of 
this effect. Once the optimal weights are determined, the average value of up(t) 
gives the value added by the manager’s selection skills, which can be negative or 
positive. 

Assume the data available for style analysis are the return time series rp(t), 
filt), ---, fm(t) for t= 1,...,T. For ease of notation, put 


rp(1) fi) +++ fm) 1 
r:= : F := : K : 1:= |: 


3 . : , 


rp(T) FO as aE 1 


Then the objective function in (6.21) can be written as 


1 ik 
var (r — Fw) = -||r Fw||? mae Fw))? 


Ale Oey: gf an aN 
T T? T T2 


1 1 
| T T T 
cwt (pr (Lan) P) w 
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Style analysis provides an improvement tool for measuring performance. The 
constructed style usually tracks the performance of the fund more accurately 
than a predefined benchmark. Style analysis has also some limitations. For 
instance, the weights may not necessarily match the style disclosed by the fund 
manager. However, as Sharpe puts it: “If it acts like a duck, it is ok to assume it is 
a duck.” Style analysis also makes the simplifying assumptions that the weights 
are constant. This is clearly not the case in actively managed funds, even without 
active trading. There exist some variations of style analysis that allow for weights 
to change. The model gets a bit more technical because it needs to incorporate 
some “regularization” term that prevents the weights from changing too much 
too often. 


Notes 


The mean-variance model was introduced in the seminal article of Markowitz 
(1952). The CAPM was developed by Treynor , Sharpe (1964), Lintner (1965), 
and Mossin (1966), by building on the mean-variance approach of Markowitz. 
In recognition of their work on portfolio choice and the CAPM, Sharpe and 
Markowitz were jointly awarded the 1990 Nobel Prize in Economics. Both Lint- 
ner and Mossin passed away before 1990 and Treynor’s manuscript was never 
published. 

The textbook by Grinold and Kahn (1999) is a classical reference in active 
portfolio management. In their textbook, Grinold and Kahn developed and relied 
extensively on characteristic portfolios. 


Exercises 


Exercise 6.1 The purpose of this exercise is to prove the two-fund theorem 
(Theorem 6.1). 


(a) Find the Lagrangian function L(x, @) for (6.1). 
(b) Solve the optimality conditions VL(x, 0) = 0 to conclude that the optimal 
solution to (6.1) is 


1 


-—— V1 
1 VN 


x* = 


SS aot aces 


where à = 1'V~ 1/7. 


Exercise 6.2 Assume u and V are respectively the vector of expected returns 
and covariance matrix of n risky assets. Assume V is non-singular and fj > 
pw'V—11/1'V—'1. Consider the mean-variance optimization problem 


* “Toward a theory of market value of risky assets”. Unpublished manuscript, 1961. 
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min x!Vx 
s.t. pele p (6.22) 
1x =1. 


Now consider the following variations: 


max px 
s.t. x'Vx <o? (6.23) 
11x = 


and 
(6.24) 


Let x* be the optimal solution to (6.22). Find appropriate values of ō and y so 
that the optimal solutions to (6.23) and (6.24) are also x*. 


Exercise 6.3 Prove that 
n 
> lej- aj] <h 
j=l 


: A ; T 
if and only if there exists a vector y = [y1 e Yn] such that 
Li x} < Yj 


0 : . 
zj — Tj S Yj 


5 Yj Sh. 
j=l 


Exercise 6.4 Prove that under the two assumptions made in Section 6.4, the 
maximum Sharpe ratio problem (6.16) is indeed equivalent to (6.18). 


Exercise 6.5 The purpose of this exercise is to prove Proposition 6.4. 
Assume the covariance matrix of asset returns V is positive definite and the 
minimum- risk portfolio (1/1'V~!1)V~!1 has positive expected return; that 
is, w'V—'1 > 0. 


(a) Show that (6.19) can be rewritten as follows: 


min z!Vz 

Zk 

st. plz = 1 
1'z-k = 0 
K > 0 


(6.25) 


(b) Show that the solution to (6.25) is 


(c) 


(a) 
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Use part (b) to conclude that the solution to (6.19) is indeed 


* 1 —1 


h. 
*Show that if w'V~11 < 0 then (6.19) is bounded but does not attain 
its maximum value. Use this fact to illustrate why the two assumptions 
made in Section 6.4 cannot simply be dropped without making some other 
assumptions. 


Exercise 6.6 The Excel spreadsheet “Exercise 6.6 Six Stocks” provides hypo- 
thetical estimates of the expected return and variance—covariance matrix for a 
set of six stocks. 


(a) 


(b) 


(e) 


Set up a quadratic programming model to determine the long-only minimum- 
variance portfolio that can be constructed with the six stocks. What is the 
expected return of your minimum-variance portfolio? 
Set up the classical Markowitz model with long-only constraints. Solve your 
model for at least six different levels of expected return ranging from the 
level found in part (a) up to the largest expected return level for which 
there are feasible portfolios. What is the value of such largest return level? 
Use your results to generate the expected return versus standard deviation 
plot for the efficient frontier. 
Assume the “benchmark” is a portfolio equally divided among the six stocks. 
Compute the beta of each stock (with respect to this benchmark) and the 
consensus (i.e., CAPM) returns assuming the risk-free rate is zero. 
Assume that your current portfolio is the benchmark, i.e., it is equally 
divided among the six stocks. Include an additional total turnover constraint 
of 70% in the model from part (b). Determine the new optimal portfolio for 
a desired expected return somewhere in the middle of the range used in 
part (b). Is it possible to find portfolios with any return level in the range 
in part (b)? If it is not, can you explain why? 
Find the portfolio with maximum Sharpe ratio subject to all constraints in 
part (d). Again, assume the risk-free rate is zero. 


Exercise 6.7 The Excel spreadsheet “Exercise 6.7 Twenty Stocks” contains 
estimated expected values, standard deviations, and correlations of monthly 
returns for a set of 20 large-capitalization stocks from the S&P 500. 


(a) 


Find the fully invested long-only portfolio with minimum variance. 


Find the numerical values of the first two and last two positions in your 
portfolio (i.e., those of BOL, NE, and XTO, ABC). These numbers are 
between 0 and 1. 


Find the numerical value of the variance of the portfolio (in bps?). 
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(b) Assume the benchmark is an equally weighted portfolio of the 20 assets. 
Determine the beta of each asset relative to this benchmark. 


Find the numerical values of the beta of the first two stocks and last stock. 
(c) Find the fully invested long-only portfolio with highest expected return that 
satisfies the following constraints: 
e The size of every position is at most 10%. 
e The portfolio has beta equal to 1. 
Find the numerical values of the first and last positions in your portfolio. 
Find the numerical value of the expected return of the portfolio (in bps). 
(d) Assume the risk-free rate is zero. Find the fully invested long-only portfolio 
with highest Sharpe ratio that satisfies the following constraints: 
e The size of every position is at most 10%. 
e The portfolio has beta equal to 1. 
Find the numerical values of the positions 15 and 16 in your portfolio (i.e., 
those of LH and R). 
Find the numerical value of the Sharpe ratio of the portfolio. 


Exercise 6.8 Suppose M is the market portfolio in a universe of securities. 
According to the CAPM, the excess return of each security is given by 


ri = pirm +&, 


where e; is the zero-mean, security-specific risk, and rą is the market excess 
return. For simplicity assume the risk-free rate is zero. 

Suppose that via a thorough security analysis a manager identifies an active 
portfolio A whose return is 


ra =QA +PArm + €a; 


let wĵ = var(e4) denote the residual variance of the active portfolio A. 
Consider a portfolio P obtained by investing a proportion w in the active 
portfolio A and the remaining proportion 1 — w in the market portfolio: 


rp(w) = wra + (1 — w)rm. 


(a) Find the expressions for the expected return and variance of the portfolio P. 
(b) Assume B4 = 1, ag > 0, and um > 0. Show that the portfolio P with 
highest Sharpe ratio is attained for the following proportion value: 
2 
aa/w 
Wo = Sala 
Lm /o M 


Furthermore, show that, for this proportion value, the Sharpe ratio of the 


portfolio is 
HM z QA A 
Sh = S3 HIR S= E +(—4 
P M Sni a 
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Exercise 6.9 Suppose the covariance matrix of a universe of N stocks has the 
following single-factor risk model form: 


V = 0%,88' +D. 


a 
Here o7, is the single-factor risk, 6 = en vee Bn] € RY is the vector of 
stock loadings on that factor, and D = diag(w7,...,w%,) where each w? is the 
idiosyncratic risk of stock i for i = 1,..., N. 


(a) Recall that the Sherman—Morrison—Woodbury matrix inverse formula is 
ıı AT Aye AT! 

l+v'Avtu 
provided A~' exists and 1+v!'A7'u¥ 0. Use this formula to show that 


(A+uv')t=A 


om 
1 +02,6 D-6 
(b) *Using (a) conclude that the holdings of the minimum-variance fully invested 
portfolio (1/1'V~!1)-V~11 are given by 


2 ; 
a, = MV (1- Bi i: i=1,..., N, 


w? PLS 


v-! = D7! -D-'ge'D*!. 


where 
2 1 
o Se 
MV“ 1'V-11 
is the variance of the minimum-variance portfolio and {zg is the following 


long-short threshold beta: 
_ 1+0%,8'D-'8 
Bis = “oD 


(c) *Show that the holdings of the long-only minimum-variance portfolio of the 
N stocks are given by an expression similar to that in part (b) above: 


2 tas 
gees Cm fp PEN pea ay 
A L 


Wi 


where 07 my is the variance of the long-only minimum-variance portfolio 
and $7 is a suitable long-only threshold beta. 


Exercise 6.10 Suppose the covariance matrix of a set of assets has the following 
constant-correlation form: for some p € (0, 1) 


Vii = 0°, Vij = poioj;, for i=1,...,n, and j=1,...,n, with i Æj. 
In matrix form, we can write the above constant-correlation matrix as follows: 
V = pø" + (1 — p)Diag(o)?, 


where ø is the vector with components o;, i =1,...,n. 


120 


Quadratic Programming Models: Mean—Variance Optimization 


(a) Use the Sherman—Morrison—Woodbury formula to show that 


=n 1 5 p T 
v-t = ——Diag(6)? 00 , 
Tag CEECEE 
where @ is the vector with components 6; = 1/o;, i=1,...,n. 


(b) Conclude that the holdings x;, i = 1,...,n, of the fully invested, minimum- 
risk portfolio are as follows: 


Ti Hi 
Psi Yj 
where 
1 P n oi 
Yi = z |1 : 
(1 — p)o; (1—p)(1+ (n—1)p) 9; 


(c) Assume all of the assets have the same volatility: that is, o1 = 02 =-:: = 
On = 0. Prove that the variance of the fully invested portfolio of minimum 


variance is 
» _ o%(1+(n—I)p) 
min n y 
Exercise 6.11 The purpose of this exercise is to detail the derivation of factor 
portfolios used in the construction of risk models. Consider the following factor 
model for a vector of returns 
r= Bf +u, 


where B is a given matrix of factor loadings and the factors f are to be con- 
structed. A common approach to construct the factors f is to solve the following 
kind of weighted least-squares problem: 


min (r — Bf)'D~'(r — Bf), (6.26) 
where D is a symmetric (often diagonal) positive definite matrix. 

(a) Show that the gradient of the multivariate function f > (r — Bf)! D~! (r — 
Bf) is 

2(B'D-tB)f — 2B'D™!r. 
(b) Conclude that the solution to (6.26) is 
f =(B'D''B) 'B'D'!r. 

(c) Consider the special case when B = b has only one column, i.e., there is only 
one factor f. Conclude that in this case the above optimal f is the return of 
the following “characteristic portfolio”: 

1 
b™D-!b 

(d) Consider again part (c) and the very special case when the entries of b are 1 
(a “buy” list) and —1 (a “sell” list) and D is a diagonal matrix. Show that 
in this case the characteristic portfolio in part (c) is a long-short portfolio 


D~'b. 
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with long holdings in the “buy list” and short holdings in the “sell list”. 
Describe the values of the portfolio holdings when D = I. 


Exercise 6.12 Consider two portfolio managers. One has 25 years of perfor- 
mance history, with a realized Sharpe ratio of 0.5. The other one has only four 
years of performance history but with a realized Sharpe ratio of 0.75. Which 
one would you prefer to invest in and why? The objective is to minimize the 
likelihood that you will lose money. Returns can be assumed to be stationary 
and normally distributed. 


Case Studies 


Asset Allocation 


The goal of this case study is to apply and test mean-variance optimization 
models as a tool for asset allocation. 


(1) Choose between four and ten asset classes and collect their monthly, quar- 
terly, or annual historical returns over a meaningful horizon (several years 
or decades). Collect also any other relevant data that may help you forecast 
expected returns. Briefly discuss why you would like to choose these assets 
and why the selected horizon is appropriate. 

(2) Use the first 67% portion of your data to compute the expected returns and 
the variance-covariance matrix for these assets. (Use the remaining 33% for 
out-of-sample testing.) 

(3) Set up the classical Markowitz model without short sales and solve it in 
Excel Solver or MATLAB for various levels of expected return. 

(4) Evaluate, compare, and report your results in-sample and out-of-sample. 

(5) Discuss the results of your model. 


Covariance Estimation 


The goal of this case study is to compare various approaches to covariance 
estimation and risk diversification. 


(1) Select a universe of at least 25 stocks. Some possible choices are the Dow 
Jones Industrial Average, the S&P 100, and the Nasdaq 100. If you feel 
ambitious, you may choose a larger universe. Your purpose is to construct 
the most diversified fully invested portfolio in this universe. Collect weekly 
or monthly historical returns for securities in your universe over a horizon 
of a few years. 

(2) Use the first 67% portion of historical data for “model calibration” (estimates 
of covariance matrix) and the remaining 33% for out-of-sample testing. 

(3) Use the in-sample data to generate the following two estimates of the covari- 
ance matrix: the sample covariance, and a single-factor model covariance. 
For the latter, you need to choose a suitable benchmark portfolio. Some 


122 


Quadratic Programming Models: Mean—Variance Optimization 


reasonable choices are a value-weighted portfolio and an equally weighted 
portfolio. 
Using the two estimates of the covariance matrices computed in (3), find 
minimum-risk fully invested portfolios (both long-short and long-only). 
Compare the results of your models on out-of-sample data. Generate plots 
of the value of the different portfolios on out-of-sample data. 
Repeat (5) using a rolling-time window assuming that all portfolios are 
rebalanced monthly. Compare these models with value-weighted and equally 
weighted portfolios. 

Report statistics such as out-of-sample mean and standard deviation of 
results, and average portfolio turnover. Comment on your results. 


Active Portfolio Management 


The goal of this case study is to apply mean-variance optimization as a tool for 


active portfolio management. If you are well versed with the Bloomberg terminal, 
you may use Bloomberg’s portfolio analytics capabilities PORT. 


(1) 


Choose at least 20 securities within an asset class (e.g., stocks in the Dow 
Jones, the S&P 500, the NASDAQ, or the Russell 3000) and find their 
weekly or monthly historical returns over a meaningful horizon. Collect also 
relevant additional data for alpha estimation. For instance, Fama—French 
factors (book-to-market ratio, size, momentum), or any other factors that 
you can use to rank your stocks. Briefly discuss your selection of securities 
and data. 
Choose a suitable “benchmark portfolio”. For instance, if you chose stocks 
from the S&P 500, a reasonable benchmark would be a value-weighted 
portfolio of the sets of selected assets. 
Use the first 67% portion of historical data for “model calibration” (estimates 
of covariance matrix, betas, alphas, etc.) and the remaining 33% for out-of- 
sample testing. 
Use the in-sample data to estimate the covariance matrix, betas and alphas 
of your stocks. The most straightforward way to estimate the betas of your 
stocks is via linear regression. This would also give you a rudimentary esti- 
mate of the alphas. However, these are “realized” estimates, i.e., they are 
backward looking. Instead you may try to forecast alphas (i.e., be forward 
looking) via one of the following approaches: 
e Momentum factor: rank stocks according to how they have performed in 
the recent three to twelve months. 
e Other factors: rank stocks according to other factors such as the Fama- 
French factors, price-to-earnings ratio, debt-to-equity ratio, or some 
combination of these. 


(5) 


(6) 
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Set up an optimization model with the goal of constructing portfolios that 

outperform the benchmark. Discuss your selection of objective and con- 

straints in your model. 

Test the results of your model on out-of-sample data. You may want to 

do this for various combinations of constraint levels (e.g., small and large 

levels of tracking errors, small and large levels of active positions, turnover 

constraint). The most interesting way of doing this is via a “rolling-time 

window”. To that end, proceed as follows: 

(a) Partition the out-of-sample data into m equally sized time intervals, e.g., 
month-long intervals. 

(b) Using the estimates from (4), find the optimal portfolio. Assume you 
hold this portfolio over the first of the m out-of-sample time intervals. 

(c) Next, shift the in-sample time window used in step (b). Keep the 
length of the in-sample time window unchanged. Use this new in-sample 
time window to update your estimates of covariance matrix, betas, 
and alphas. Find the new optimal portfolio. Assume you will hold this 
portfolio over the next out-of-sample time interval. 

(d) Repeat step (c) until you reach the last (mth) out-of-sample time 
interval. 

Report and comment on your results. 


Sensitivity of Mean—Variance 
Models to Input Estimation 


One of the most salient drawbacks of mean-variance optimization is its high 
sensitivity to the estimation of input parameters. The sensitivity is due to the 
very nature of the optimization process: if there are assets whose returns appear 
to be superior, the portfolios generated by an optimization procedure will try to 
take advantage of these apparently superior assets by overweighting the holdings 
on those positions. Unfortunately in a practical setting there is inevitable noise 
in the estimation of inputs to a mean—variance model. Small perturbations in the 
values of the inputs may lead to large swings in the composition of the portfolio. 
This unfortunate phenomenon is basically due to the fact that the optimizer is 
overly responsive given the quality of the inputs typical in portfolio construction. 
A related phenomenon is the fact that the composition of portfolios is often 
non-intuitive. Theoretical and empirical evidence indicates that the estimate of 
expected returns is more critical than the estimate of the covariance matrix. 

The sensitivity of mean-variance models to input estimation manifests itself 
in the differences among the true, estimated, and actual efficient frontiers, terms 
coined by Broadie (1993). The true efficient frontier is the one computed with 
the true (unobservable) expected returns and covariance matrix. The estimated 
frontier is the one computed with estimates of these parameters. The actual 
frontier is defined as follows: take the portfolios in the estimated frontier and 
calculate their true expected returns and variances. The actual frontier always 
lies below the true frontier. In principle the estimated frontier may lie anywhere 
with respect to the true frontier. However, due to the optimization process, if the 
estimation errors have zero mean, the estimated frontier is likely to lie above the 
true frontier. In that case the actual frontier would be well below the estimated 
frontier. Equivalently, the ex post performance of estimated efficient portfolios 
would typically be substantially worse than their ex ante performance suggested 
by the mean-variance model. Figure 7.1 illustrates the typing relative placement 
of the three frontiers. 

The following specific example illustrates the sensitivity of mean-variance 
models to the quality of the inputs. 

Consider a simple portfolio optimization problem with three assets whose 
expected returns and covariance matrix are 


0.11 0.250 0.225 0.045 
w= 0.10], V= (0.225 0.250 0.045] . (7.1) 
0.05 0.045 0.045 0.090 
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Figure 7.1 Efficient frontiers 


Figure 7.2 displays the composition of long-only efficient portfolios for this 
problem. 


Basset 3 
Basset 2 


Dasset 1 


Figure 7.2 Area chart of long-only efficient portfolios for yz and V as in (7.1) 


The picture makes sense from the pure optimization standpoint: assets 1 and 2 
are similar but the expected return of asset 1 is slightly larger. Hence for higher 
target expected returns, the efficient portfolios have a much larger holding in 
asset 1 than in asset 2. However, from the portfolio construction standpoint this 
is unintuitive: assets 1 and 2 are very similar and, for all practical purposes, 
exchangeable because the slight difference could easily be due to estimation 
error. Therefore, it would be more intuitive for the positions of these two assets 
to be roughly the same. We can also look at the problem in a different way: 
Suppose the expected returns of assets 1 and 2 were slightly perturbed so that 
they are swapped. Then the composition of the efficient portfolios would change 
drastically. This again is a fairly counterintuitive and unnatural behavior. 

The input sensitivity of mean-variance models is a central issue in portfolio 
management and has been a subject of intense study. There is a tremendous 
upside potential in finding appropriate ways of harnessing the power of portfolio 
optimization without getting caught on this major shortcoming. We will next 
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describe some of the most popular techniques that aim at mitigating this prob- 
lem. The techniques can be classified in two main categories. The first category of 
techniques tries to improve the quality of the inputs to the portfolio optimization 
problem. The second category of techniques aims to tweak the optimization pro- 
cedure. One of the most widely used techniques in the first category is the Black- 
Litterman model introduced by Fisher Black and Bob Litterman at Goldman 
Sachs Asset Management. We will discuss this technique in some detail. We will 
also briefly describe another related technique based on Bayesian adjustments. 
We will subsequently discuss some techniques in the second category, namely 
resampled efficiency and robust optimization. 


Black—Litterman Model 


The basic idea of the Black—Litterman model is to tilt the market equilibrium 
returns to incorporate an investor’s views. In principle a classical mean-variance 
model requires estimates of expected returns for all assets in the investment 
universe considered. This is typically an enormous task. Investment managers 
are unlikely to have detailed knowledge of all securities at their disposal. Typi- 
cally, they have a specific area of expertise. Furthermore, some modern trading 
strategies are associated not with absolute but with relative rankings of securities. 
For instance, a pairs trading strategy corresponds to a forecast that one stock will 
outperform another one. The key insight of Black and Litterman was that there is 
a suitable way of combining the investor’s views with the market equilibrium. The 
exposition of the Black—Litterman model below is based on Black and Litterman 
(1992), Fabozzi et al. (2007), and Litterman (2003). 


Basic Assumption and Starting Point 


The Black—Litterman model is an equilibrium-based model, meaning that the 
expected returns of the assets should be consistent with the market equilibrium 
unless the investor has some specific views. In other words, an investor without 
any views on the market should hold the market. We shall let m denote the 
equilibrium return vector, and V the covariance matrix of the asset returns. 

The true expected return vector yz is unknown. As a starting point, we assume 
that the equilibrium return vector serves as a reasonable prior estimate of the 
true return vector in the sense that 


u ~ N(x, Q). 


That is, 2 is a multi-normal random vector with expected value m and covariance 
matrix Q. The matrix Q represents the confidence on the equilibrium returns as 
an estimate of expected returns. 
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Expressing Investors’ Views 


A key ingredient of the Black—Litterman model is to incorporate investors’ views 
on the expected returns. The framework is fairly flexible. An investor may have a 
few different views, each of them involving either a single asset (an absolute view) 
or several assets (a relative view). Formally, a collection of views is expressed as 


Pu=qte, e~ N(O,2). 


Each row in the equation Pu = q+ € is a view that represents a forecast. The 
term e represents the degree of confidence in the views. The covariance matrix Q 
is typically a diagonal matrix. A weak view is a view with large variance; a strong 
view is a view with small variance. In the extreme, a certain view is a view with 
zero variance. Each of the views can be absolute or relative as described above. 
For a concrete example, consider an asset allocation problem with seven asset 
classes: Australia, Canada, France, Germany, Japan, United Kingdom, United 
States. Suppose we have two views: 


e Return on Germany will be 12%. 
e UK will outperform US by 2%. 


These views can be expressed as 


u4 = 12% + €61 
us — U7 = 2% + €2. 


In matrix notation this corresponds to Pu = q + e€ for 


af: E _ [2%] fa 
Nie 20° 20nd hE Poe ESE ez? 


Merging Investors’ Views and Market Equilibrium 


The key insight of the Black—Litterman model is a proper way to combine the 
investor’s views with the prior market equilibrium. First, consider the simpler 
case when the views are assumed to be certain; that is, Q = O or equivalently 
the vector of expected returns must be tilted to satisfy the views Pu = q. In 
this case the posterior estimate of u given the prior y ~ N(7,Q) and the views 
Pu = qis 

Ê= r + QP' (PQP!) (q — Pr). (7.2) 


Some matrix algebra shows that indeed Pf = q. That is, the posterior estimate 
satisfies the views Pu = q. 

In the more general case when the views are not certain, the posterior estimate 
of u given the prior y ~ N(7, Q) and the views Pu = q+e, with e ~ N(0,Q), is 


jp =7+QP'(PQP'+9)"1(q— Pr). (7.3) 
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When Q is non-singular, f also has the following equivalent expression: 
A = (Q7 + PTR IP) HQ In + PTQ a). (7.4) 


Observe that the expression (7.2) for certain views can be recovered from (7.3) 
by taking Q = 0. 

We next give a derivation of the formula (7.4) for the posterior estimate of 
u when Q is non-singular. The exercises at the end of the chapter show how 
the derivation can be tweaked for any Q. Stack the two equations for the market 
equilibrium 7 = w+eé,z, En ~ N (0, Q), and for the investor’s views q = Pu+ eq, 
€a ~ N (0, Q) as 

y=Mpu+e, e~ N(0,5) 


s[i E 28 gh 


The estimation problem can be stated as the following weighted least-squares 
problem: 


for 


min (y — My)"5~"(y — My). 
The optimality conditions for this problem yield 
2M'&X Mpu — 2M' £ ty = 0. 
Hence we obtain 
a = (MTS M) MTE ty 
= (Q! + PQP) HQ m + P'RA) 
=r + QP'(PQP' +9) (q -— Pr). 


We end this section with a couple of general remarks. 

First we note that the Black-Litterman model can be thought of as an “inverse 
optimization problem”: If one views the market equilibrium as the optimum 
solution of a portfolio optimization problem, what data would produce this 
outcome? In particular, given investor views, what choice of u would best fit the 
market equilibrium solution? This leads to the least-squares problem formulated 
above, whose solution ĝ has the expression (7.3) that we just computed. This 
“inverse optimization” philosophy was proposed by Bertsimas et al. (2012). It has 
the added flexibility of allowing investor views on volatility and market dynamics. 

We also note that the analysis in the Black—Litterman model relies heavily 
on the assumption that the error term € ~ N(0,Q) is normally distributed. 
When this is not the case, the Black—Litterman framework is still meaningful 
but the analysis is more complex and a closed-form solution like (7.3) does 
not usually exist. However, a numerical solution may be possible using the 
nonlinear programming algorithms discussed in Chapters 18 and 20 (Kocuk and 
Cornuéjols, 2017). 
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Shrinkage Estimation 


Another approach to improve the quality of estimated expected returns is based 
on shrinkage estimators. These types of estimators are rooted in the classical 
finding of Stein (1956) that biased estimators may, in a formal fashion, be 
superior to the unbiased sample mean. As we detail below, the central idea 
is that the estimation can be improved by shrinking the sample mean towards a 
target. The non-technical article of Efron and Morris (1977) gives an enlightening 
discussion of an application of this approach to the estimation of baseball batting 
averages. 

To formalize ideas, consider the problem of estimating the mean of an N- 
dimensional multivariate normal variable r ~ N (pu, V) from a set of observations 
r1,...,r¢. For a given estimate jz, consider the quadratic loss function 


L(t, p) := (w— AV (p — fi). (7.5) 


For a given loss function, the risk of an estimator is E(L(f, w)), where the expec- 
tation is taken over the space of samples r1,..., rp. An estimator is inadmissible 
if there exists another estimator with lower risk. 

For the quadratic loss function (7.5) and N = 1,2, it is known that the optimal 
estimator is the sample mean T := (1/T) (rı +---+r7). By contrast, for N > 2, 
the James—Stein shrinkage estimator 


fig := (1 — w)r +w uo 1 


has lower risk than the sample mean r for 


w =min | 1, — = . 
T(t — pol)" V(F — pol) 


Here T is the number of observations, and po is an arbitrary number. The vector 
Lol and the weight w are referred to as the shrinkage target and shrinkage 
factor respectively. Although some choices of uo are better than others, what is 
surprising is that in theory uo could be any fixed number. This fact is called the 
Stein paradoz. 

The James-Stein shrinkage estimator can be seen as a combination of two 
estimators: 


(1) an estimator with little or no structure (like the sample mean); 
(2) an estimator with a lot of structure (the shrinkage target). 


The exact combination of these two estimators is determined by a certain shrink- 
age intensity. As we discuss below, this same shrinkage approach has been 
successfully applied to obtain improved estimators of covariance matrices and 
beta exposures. 

The following shrinkage estimator proposed by Jorion (1986) is fairly popular 
in the financial literature. The estimator was derived via an empirical Bayesian 
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approach. As a shrinkage target, use the vector 91 for 


EIVI 
ios NET 
and as a shrinkage intensity, use 
N+2 


O°" N+24 T(E — ml) V E- pol) 


Shrinkage can also be applied to other estimation problems. For instance, 
Ledoit and Wolf (2003, 2004) propose shrinkage approaches for covariance esti- 
mation in the same spirit as the James-Stein shrinkage estimator: shrink the 
sample covariance matrix V (an unstructured estimator) towards a highly struc- 
tured target estimator Vo: 


Viw = (1 = w)V + wVo. 


The shrinkage target estimator Vo could be a single-factor estimator, an esti- 
mator of the covariance matrix with constant correlation, a diagonal matrix, or 
a multiple of the identity matrix. 

Shrinkage is also routinely used for estimating benchmark exposures in a 
stock universe. Let r;, i = 1,..., N, denote the excess returns of stocks in the 
investment universe and rg denote the excess return of the benchmark. Recall 
that the beta of stock i captures the benchmark portion of the return on stock 
i via the linear model 


ri = Bre t+ 0i. 


The value of p; is the benchmark exposure of stock i. Given historical realizations 
of r;, for i = 1,..., N, and rg, we can obtain estimates B; of 6; fori =1,...,N 
via ordinary least-squares linear regression. The forecasts given by these natural 
estimators tend to overestimate the betas of stocks with high benchmark expo- 
sure and underestimate the betas of the stocks with low benchmark exposure. 
Improved forecasts can be obtained by shrinking the betas obtained from the 
least-squares procedures towards one (the benchmark beta): 


where 8 denotes the vector of beta estimates from the least-squares procedure. 

A common rule of thumb in the above shrinkage estimators of covariance 
matrix and vector of betas is to use a shrinkage intensity w = 1/2 for esti- 
mates based on 60-month long historical data. A thorough discussion on the 
appropriate choice of shrinkage intensity can be found in Ledoit and Wolf (2003, 
2004) and Blume (1975). Portfolio optimization can be viewed as a stochastic 
optimization problem (see Chapter 10). Shrinkage is relevant in this more general 
context as well (Davarnia and Cornuéjols, 2017). 

The exercises at the end of this chapter suggest some computational experi- 
ments that illustrate the effectiveness of shrinkage estimators. 
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Resampled Efficiency 


A different approach to address the sensitivity of mean-variance optimization to 
estimation error is to apply the bootstrap technique from statistics. The bootstrap 
technique is a method to estimate standard errors and confidence intervals of 
statistics of a dataset via random resampling from the dataset with replacement. 

The application of bootstrapping to mean-variance optimization was initially 
explored by Jorion (1992) and later further developed and marketed by Michaud 
and Michaud (2008). The basic idea is to consider the joint problem of parameter 
estimation and portfolio construction as a statistical procedure: the efficient port- 
folios can be seen as a statistic on a set of financial data used for estimation. The 
resampled efficiency technique proposed by Michaud and Michaud proceeds by 
applying bootstrapping to this statistical process. Suppose there is a procedure 
that estimates the vector of expected returns and covariance from historical 
data. Use the available data to produce these estimates and compute efficient 
portfolios. Repeat this same process either by sampling from these estimates, or 
by bootstrapping the available data to obtain new estimates of expected returns 
and covariances. All these estimates are statistically equivalent. For each of them, 
we can generate the corresponding set of efficient portfolios. The collection of all 
of these portfolios forms some sort of equivalence region. We would like to take 
some average of the equivalence region so that the effects of estimation error 
are mitigated. However, it is not obvious how to average since the equivalence 
region contains portfolios with low and high variance. We do not want to mix 
“apples and oranges”. Michaud and Michaud’s suggestion is to average portfolios 
that are in some equivalent risk-return bucket. To that end, we propose the 
following procedure: For each efficient frontier, save m evenly distributed efficient 
portfolios. Rank them 1 to m. Then take averages of same-rank portfolios from 
all efficient frontiers. 

This resampling procedure can be more precisely described as follows (see 
Algorithm 7.1). Suppose we have a procedure to produce estimates fj and Vv 
from a history of T periods of historical data. 


Algorithm 7.1 Resampling procedure 
1: fori =1,...,5 do 


2: simulate a new history of T periods by resampling the original history 

3: use the simulated history to generate new estimates fe; and Vi 

4: use jt; and V; to generate m equally spaced efficient portfolios 
Xli. Xm,i 

5: end for 


To generate the resampled efficient portfolios, take averages of equally ranked 
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efficient portfolios generated above: 


S 
1 
Xj, resampled ‘= S X Xj- 
i=1 


The resampled efficient frontier is the expected return versus standard deviation 
chart of the resampled efficient portfolios with the original estimates fì and v. 

There are a number of limitations to resampling (Scherer, 2002). The entire 
process is only a heuristic; there is no sound theory to support why the process 
should mitigate the effects of estimation error. The methodology does have the 
feature of generating portfolios that look well diversified, and this is generally 
well received. However, this feature can be attributed to the role of variability 
in the averaging process. The process is intense computationally, as it multiplies 
the work involved in conventional mean-variance optimization. Furthermore, the 
procedure does not provide any clear mechanism to facilitate the incorporation 
of views as in the Black-Litterman model. 


Robust Optimization 


Robust optimization is a fairly recent development that considers uncertainty in 
some parameters directly in the optimization problem. The general idea of robust 
optimization is to generate a solution that is good for all possible realizations 
of the uncertain parameters. Consider a minimization problem with inequality 
constraints 


min f(x,p) 


(7.6) 
s.t. gi(x,p) <0, 7=1,...,m. 


Here the vector p stands for some parameters that define the objective and 
constraints functions. 

Consider first the case when the uncertain parameters occur in the constraints 
only. Assume that the set of parameters p is uncertain but it is known to be in 
some uncertainty set MU. In this case a robust version of (7.6) is one where the 
optimization is performed over points that are feasible for all possible realizations 
of the uncertain parameters p € U; that is, 


min f(x) 
x 
s.t. a gi(x,p) <0, 7=1,...,m. 


On the other hand, consider the case when the uncertain parameters occur in 
the objective only. In this case a robust version of (7.6) is one that finds the 
solution that would be best, given the worst possible realization of the uncertain 
parameters p € U; that is, 
pa a (x, P) 
s.t. gi(x) <0, i=1,...,m. 
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If uncertain parameters occur in both the objective and constraints, then the 
robust version is as follows: 
min max x 
imma. £06 P) 
s.t. max gi(x,p) <0,i=1,...,m. 
peu 


As we detail in Chapter 19, for suitable types of uncertainty sets the above robust 
versions can be rewritten as an optimization problem that is manageable albeit 
via more involved optimization machinery. 


Other Diversification Approaches 


The challenges associated with expected return estimation and the input sensitiv- 
ity of mean-variance models have given rise to quantitative portfolio construction 
approaches that eschew expected return estimation and focus on managing risk 
only. We next discuss some popular approaches of this kind that have led to 
the development of a variety of investment products in the asset management 
industry. 

Assume V is the covariance matrix of asset returns in some investment universe 
and ø is the vector of volatilities (standard deviations) of the asset returns. In 
particular, the diagonal entries of V are the squares of the entries of ø. 

The minimum-risk portfolio is the portfolio in the efficient frontier of minimum 
variance. In the absence of constraints, this portfolio is the solution to the 
following quadratic programming model: 


min x!Vx 
x 


st. 1'x=1. 
That is, 
* 1 —1 


For the special case V = I (the N x N identity matrix) the minimum-risk 
portfolio is the so-called equally weighted portfolio 


where N is the number of assets in the universe. 
On the other hand, if V is diagonal, that is, V = diag(o)?, then the portfolio 

components are proportional to the inverse of the squares of the volatilities: 
1/07 


r = eat i=l 


Ja 1/0? 


In particular, this portfolio is the value-weighted portfolio if the capitalization of 


N. 


pees 


asset i is used as a proxy for 1/o?. 
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We next discuss two more recent diversification approaches, namely risk parity 
and maximum diversification. To that end, we first discuss the related concept 
of risk contribution. Observe that the risk (standard deviation) of a portfolio 
x = (z1,..., £y ) is given by 


op(x) = Vx! Vx. 


If we compute the partial derivative of this portfolio with respect to x;, we obtain 
the marginal contribution to risk of asset i: 


_ Oop(x) (Vx) S 
MCR;(x) = Bie SRE i=1,...,N. 


The contribution to risk of asset i is: 


zi: (Vx); 
CR;(x = x; : MCR;(x) = pkt N 
e9 œ Vx Vx 
Observe that 
N 
XC CRx) = Vx! Vx = op(x) 
i=l 


Consequently, we say that x is a risk-parity portfolio if all the assets in the 
portfolio have the same contribution to risk; that is, if 


op(x) 

N Fi 
Again, in the special case when V = diag(o)?, the fully invested risk-parity 
portfolio is 


1 7 
ee oe 


‘ Da 1/0; l 

For a general covariance matrix V and portfolio constraints, it may not be 

possible to attain perfect risk parity. In this case, we can instead minimize some 

kind of measure of deviation from risk parity. Here are some choices for examples 
of these kinds of measures proposed in the literature: 


N N 
DRP, (x) = 2 dle (Vx); — z; (Vx);)? 


w=1 7=1 
xi: (Vx) ey 
pre) = (Aaya wy) 
i=1 
N 
xi: (Vx); 1 
DES) D x' Vx N 


The optimization problem associated with minimizing any of these deviation 
measures is in general quite a bit more challenging than other mean-variance 
models, as these problems are not convex. The development of efficient numerical 
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algorithms to solve these kinds of optimization problems is a topic of current 
research. 

Another approach to diversification is maximum diversification (Choueifaty 
and Coignard, 2008). More precisely, maximize the diversification ratio 


o'x 


x Vx 
where ø is the vector of asset volatilities. A motivation for this approach can 
be given as follows: Observe that the diversification ratio is proportional to 
the Sharpe ratio if u is proportional to ø. Hence maximizing diversification 
is equivalent to maximizing the Sharpe ratio under the assumption that the 
expected returns of the assets are proportional to their volatilities. 

In the absence of other constraints, the fully invested maximum diversification 
portfolio is the solution to the optimization problem 


; o'x 
min — 

x 4 /x Vx (7.7) 
s.t. 1'x=1. 


In the special case when V = diag(o), the solution to (7.7) coincides with the 
fully invested risk-parity portfolio: 
1/0; 
iat 1/01: 


Exercises 


Exercise 7.1 The purpose of this exercise is to provide a derivation of the 
Black—Litterman posterior formula. 


(a) Consider the case when views are certain; that is, when Q = 0. In this case 
if we stack the equations for the prior and for the views we get 


H j eg , €x ~ N(0,Q). 


The estimation problem can then be stated as the following constrained 
weighted least-squares problem: 


min (m -—p)' Q(T- p) 
H 
s.t. Pu=q. 


(i) Write down the optimality conditions for this constrained problem. 
(ii) Show that after solving the optimality conditions we obtain 


jp =r + QP' (PQP!) (q — Pr). 
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Qn (0) 

0 0 
corresponds to the case when the views can be split in two blocks and the 
second block of views are certain: 


Pu = Ba => Pe E€ N N (0, Q11). 


(b) Now consider the case Q = | , where (2); is non-singular. This 


In this case if we stack the equations for the prior and for the views we 


get 
Gi H+ ex 
n (0) 
qı| = Pip + €1] , : g N(0, x), X= P d 
qo Pop = aa 


The estimation problem can then be stated as the following constrained 
weighted least-squares problem: 


. i -1| T-H 
min >> 

u E = ‘an la nal | 
s.t. Pop = q2. 


(i) Write down the optimality conditions for this constrained problem. 
(ii) Show that after solving the optimality conditions we obtain 


Ô= r + QP'(PQP' +9)! (q -— Pr). 


(c) *Reduce the case when 2 is a general covariance matrix to the case dis- 
cussed in step (b). To that end, use the following fact from matrix algebra: 
if Q is symmetric and positive semidefinite, then there exists an orthogonal 
matrix U and a diagonal matrix A with non-negative entries such that 
Q = UAU'. Use U to make a change of variables so as to write the views 
as in step (b) and conclude that the expression (7.3) for the posterior ĝ 
holds. 


Exercise 7.2 The purpose of this exercise is to explore the effect of estima- 
tion error on the computation of efficient portfolios by comparing the “true”, 
“estimated”, and “actual” efficient frontiers. To that end, assume the expected 
return and covariance matrix in the Excel spreadsheet “Exercise 7.2 & 7.3 Eight 
Assets” are the “true” values for the expected returns and covariances for a set 
of eight assets. These are monthly expected returns and covariances. 

Next, using these “true” values and assuming a multivariate normal distribu- 
tion for the returns, generate monthly returns for ten years. You may find the 
MATLAB multivariate normal random number generator mvnrnd useful for this 
purpose. 


(a) Compute the sample mean and the sample covariance matrix of the returns 
you generated. 


(b) 


(e) 
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Compute at least ten long-only efficient portfolios along the efficient frontier 
based on the estimates found in part (a). Choose efficient portfolios whose 
expected returns range from that of the long-only minimum-variance portfo- 
lio to that of the long-only portfolio with maximum expected returns. Save 
these efficient portfolios. 

Now compute the “actual” expected returns and standard deviations for the 
portfolios found in step (b). These are the values of true expected returns 
and standard deviation of these portfolios. 

On the same figure plot the “estimated” efficient frontier found in (b), the 
“actual” frontier from step (c), and the “true” frontier (the one we would 
get if we used the true parameters). 

Repeat the above steps (generate a ten-year history, estimate, compute effi- 
cient portfolios) a few times. What do you observe? 


Exercise 7.3 The Excel spreadsheet “Exercise 7.2 & 7.3 Eight Assets” provides 
monthly expected returns and covariance matrix for eight asset classes. 


Find the long-only portfolio with maximum Sharpe ratio, assuming a zero 
risk-free interest rate. 
Assume your initial portfolio is equally divided among the eight asset classes. 
Repeat step (a) but under the additional restriction that the two-sided 
turnover is at most 60%. 
Assume the benchmark is an equally divided portfolio and the risk-free 
interest rate is zero. Find the vector of equilibrium returns 7. 
Suppose an investor has the following two views: 

View 1: the return on Euro bonds will be 0.40%. 

View 2: the return on an equally weighted portfolio of USA and UK stocks 

will be 1.2%. 

Use the Black-Litterman model to merge these views with the equilibrium 

returns. Assume the investor has total confidence in the views. 


Exercise 7.4 Consider the problem of finding the maximum diversified fully 
invested portfolio in a universe of n risky assets: 


st. 1'x=1. 


Here ø is the vector of assets volatilities and V is the covariance matrix. 


(a) 


Show that the maximum diversified portfolio (i.e., the solution to the above 
problem) is 
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(b) 


Use part (a) to show that 


2 
IMD ų4y-io 
CA ’ 


XMD ‘= 


where omp and ay are respectively the volatility of xmp and the weighted 
average volatility of the assets in xmp. In other words, 


2 T T 
Oup =XumpVXMD, CA=O Xmp. 


Suppose the covariance matrix has the following constant-correlation form: 
For some p € (0,1) 


Vi = 07, Vij = poioj, for i=1,...,n, and j=1,...,n, with i Æj. 


In matrix form, we can write the above constant-correlation matrix as fol- 
lows: 


V = poo" + (1 — p)Diag(o)?, 


where ø is the vector with components o;, i =1,...,n. 
Show that in this case the holdings of the maximum diversified portfolio 
Xmp are given by 


1 1 
Dia lee o 


CS t= leony 


Exercise 7.5 The purpose of this exercise is to visualize how the covariance 
matrix gets distorted when it is estimated using a finite set of observations. The 
exercise also explores how a shrinkage technique of Ledoit and Wolf can mitigate 
this kind of distortion. 


(a) Assume n = 10 assets have returns that follow a multivariate normal distri- 


bution with expected returns equal to zero and true covariance matrix equal 
to the n x n diagonal matrix 


0.8 
0.85 


1.2 
1.25 


(The diagonal entries are equally spaced at 0.05 intervals.) 

Generate T = 120 samples r, t = 1,...,7, from this joint distribution. 
Each of these samples r; € R!° is drawn from the ten-dimensional multivari- 
ate normal distribution N(0,V). You may find the MATLAB multivariate 
normal random number generator mvnrnd useful for this purpose. 


(i) 


(iii) 
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Use the T samples to estimate the sample covariance matrix V as 
follows. Let r := (1/T) Sy Yr, Z := r4 —T, t=1,...,7, and 


12 
P TŞ 
y= Toa 


Plot the eigenvalues both of the true covariance matrix V and of the 
estimated covariance V on the same plot. Do you observe anything 
peculiar? 

Using the estimated covariance Ý find the estimated minimum-risk fully 
invested portfolio x. Compute the estimated minimum variance RTVR, 
the actual minimum variance x'Vx, and the true minimum variance 
(x*)"Vx*, where x* is the true minimum-risk fully invested portfolio 
for V. What do you observe? 

Repeat parts (i) and (ii) several times (anywhere from a handful to a 
few thousand times). What do you observe? 


(b) We will next apply the shrinkage technique of Ledoit and Wolf. To that end, 
let A;, i = 1,...,n, denote the eigenvalues of the sample covariance matrix 
V and À := (1/n) Xi; A. Define C := AI and 


T 
1 ` 
i T > trace((z:z; — V)?) 
t=1 


= L 
trace((V — C)?) 


Q := min ; 


Finally consider the shrunken matrix 


(i) 


(ii) 


(iii) 


V :=(1-a)V +a. 


Plot the eigenvalues of the true covariance matrix V, of the sample 
covariance Ý, and of the shrunken covariance V on the same plot. 
What do you observe now? 

Using the shrunken covariance V find the estimated minimum -risk fully 
invested portfolio x. Compute the estimated minimum variance x' VX, 
the actual minimum variance X! Vx, and the true minimum variance 
(x*) Vx*, where x* is the true minimum-risk fully invested portfolio 
for V. What do you observe? Are the results any different from part 
(a)(i)? 

Repeat parts (i) and (ii) several times (anywhere from a handful to a 
few thousand times). What do you observe? Are the results any different 
from part (a)(iii)? 


8.1 


Mixed Integer Programming: 
Theory and Algorithms 


Mixed Integer Programming 


The types of optimization models that we have discussed so far, namely linear 
and quadratic programming, allow variables to take a continuum of values. In 
particular, the numerical solutions to these kinds of models may have fractional 
values. For instance, the solution to a portfolio construction model could suggest 
a plan to purchase 3205.76 shares of stock XYZ. In many cases it is natural to 
round this value and to interpret it as a suggestion to purchase 3205 or even 
3200 shares of stock XYZ. However, if a variable in an optimization model is 
associated with choosing among two or more alternatives, for example, as in the 
capital budgeting problem described below, then a model that suggests taking 
fractions of each of the alternatives would be of limited value. Instead, a binary 
decision, namely “to choose” or “not to choose”, needs to be made for each 
alternative. 

In general, an integer variable in an optimization model is a variable that is 
restricted to take integer values only. A mized integer program is an optimization 
problem with the constraint that some of the variables must take integer values. 
In particular a mixed integer linear program is a problem of the form 


min c!x 
Dx >d 
zj EZ, JEJ 


for some vectors c € R”, b € R”, d € R?, matrices A € R™*", D € RP*”, and 
subset J C {1,...,n} of the variables. When all variables are restricted to be 
integer, that is, when J = {1,...,n}, the problem (8.1) is called a pure integer 
linear program. 

An important case occurs when a model includes binary variables; that is, 
variables that are restricted to take the value 0 or 1. When all the variables in 
a mixed integer program are of this kind, it is called a binary program. As the 
examples below show, binary variables enable the modeling of important realistic 
features such as logical constraints, cardinality and threshold constraints, and 
others. However, this improvement in modeling power comes with a tradeoff in 
computational cost. The presence of a significant number of integer variables in 
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an optimization problem can make it extremely difficult or impossible to solve 
unless there is a specific exploitable structure. 


Example 8.1 (Capital budgeting) Suppose we have a capital of 19 million dollars 
for long-term investment and have identified four investment opportunities with 
the following investment requirements and net present values (in million dollars): 


Investment 1 Investment 2 Investment 3 Investment 4 


Required investment 7 10 6 3 
Net present value 9 11 7 4 


What investments should we choose to maximize our total net present value? 
Each investment is a “take it or leave it” opportunity: the investment must be 
funded entirely or not at all. 


This problem can be formulated as the following binary linear programming 


model. 


Binary linear programming model for capital budgeting 
Variables: 


1 if investment 7 is undertaken : 
y= . for i =1,...,4. 
0 otherwise 
Objective, in millions of dollars: 
max 92; + 1lvo+ T£3 + 424 
Constraints: 
7x, + 10% + 623 + 3x4 < 19 (budget constraint) 


x, E€ {0,1} fori =1,...,4 (binary variables). 


The optimal solution to the linear programming relaxation of this model, 
obtained by relaxing the binary constraints z; € {0,1}, for i = 1,...,4, to 
0 < x; <1, fori=1,...,4, is 


This is not a feasible solution as x3 is not binary. If we round x35 to 0 we get a 
feasible solution. However, a better (and in fact the optimal) solution is 
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This could be counterintuitive as Investment 1 has the best “bang for the buck”; 
that is, has the highest ratio of net present value to investment requirement. 

The presence of binary variables also readily enables the modeling of logical 
restrictions. For example, the logical restriction 


If Investment 2 is made then Investment 4 must also be made 
can be modeled via the constraint 
G2 Š T4. 
Similarly, the logical constraint 
If Investment 1 is made then Investment 3 must not be made 
can be modeled via the constraint 
zı + z3 <1. 


Example 8.2 (Clustering) Clustering is a popular technique in data analysis. 
It is concerned with partitioning a collection of objects into subsets or “clusters” 
so that the objects within each cluster are more closely related with each other 
than with objects assigned to different clusters. Suppose we wish to partition a 
collection of N objects into K < N clusters based on some kind of similarity 
measure: 


pij = similarily measure between objects i, j. 


To give a financial flavor to this example, assume the objects to be clustered are 
N stocks and the similarity measure p;; is the correlation between the returns 
of stocks 7 and j. 


We next describe a possible approach to the above clustering problem via 
binary programming. This approach is closely related to the popular K-median 
problem. Before diving into the binary programming formulation, we describe 
some of the main ideas. Assume the objects are indexed 1,...,N and are to 
be partitioned into the K clusters C1,...,CK. A key idea is to designate an 
element je in each cluster Ce as the centroid of cluster Ce. This choice suggests 
the following natural measure of the similarity within cluster C?: 

os Pi je 

i€Ce 
and in turn it gives the following overall measure of the quality of the clusters 
Ch, sey Cr: 


K 
X X Pi je: 
L=1 iE€Ce 


The following crucial observation is key in our formulation. The centroid j¢ 
represents the elements in cluster Cy. Indeed, each cluster contains precisely the 
objects assigned to its centroid, and the clusters are completely determined by 


8.2 


8.2 Numerical Mixed Integer Programming Solvers 143 


the choice of the centroids. These ideas are formalized in the following binary 
linear programming model. 


Binary linear programming model for clustering 
Variables: 
1 if 7 is a centroid 
Yj = 


0 otherwise for j =1,..., N. 


for i,j =1,...,N. 


{ 1 if dis represented by j 
Tij = 


otherwise 
Objective: 
N N 
max 5 DD PijTij. 
j=1 i=1 
Constraints: 
N 
5 y =K choose K centroids) 
j=1 
N 
Xoia =l; fori = 1,..., N each object must be 
j=1 represented by one centroid) 
Lig EUn for i,j =1,...,N (iis represented by 7 
only if j is a centroid) 
Zij yj E€ {0,1} fori, j=1,...,N_ (binary variables). 


Another correct formulation is obtained if we replace the third set of N? 
constraints 


Vig < Yj, for i,j =1,...,N 


with the set of N constraints 
N 
XO ry < Ny, forj=1,..., N. 


i=1 


Numerical Mixed Integer Programming Solvers 


Excel Solver 


The steps required for solving a mixed integer (or binary) linear program in 
Excel Solver are nearly identical to those for solving linear programs. The only 
new step is to state that some variables are integer (or binary). 

Figure 8.1 displays a printout of an Excel spreadsheet implementation of the 
binary linear programming model for Example 8.1 as well as the dialog box 
obtained when we run the Excel add-in Solver. The spreadsheet model con- 
tains the three components of the binary program. The decision variables are 
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in the range B7:E7. The left-hand side of the budget constraint is in cell B9 
and the objective function is in cell B10. The Excel formulas in cells B9 and B10 
are SUMPRODUCT (B4:E4,B7:E7) and SUMPRODUCT(B5:E5,B7:E7) respectively. In 
addition to these components, notice the constraint 


$B$7:$E$7 = binary 


in the Solver dialog box. 


= a as > E 1.6.0.0, Solver Parameters 
2 
3] investment 1 investment 2 Investment 3 Investment@] Set Objective: 58510 Fi 
4 _ investment requirement 6 
|S inet present value 3 11 7 4 te (the Jenn exer: 
7_choose? 
za By Changing Variable Cells: 
9 |total investment requirements <= 19 E  5857:5E57 = Ei 
10 total net present value 
u Subject to the Constraints: 
12| 
$BS7:SES7 = binary | ‘Add 
-42 $859 <= SD59 
[a5 | Change 
| 16 | ~ Dele 
Lae | i 
ma Delete 
19| 
20 Reset All 
= =e 
32 Load/Save 
24 
25 (4 Make Unconstrained Variables Non-Negative 
26 | "Gaon 
27 Select a Solving Method: | Simplex LP inal Options | 
2R | 


Figure 8.1 Spreadsheet implementation and the Solver dialog box for the capital 
budgeting model 


MATLAB CVX 


Figure 8.2 displays a CVX script for the same problem. 


File Edit Text Go Cell Tools Debug Desktop Window 
x oF 71068 éeBQBre OD- A 


aa o- mo] + m] Bw oO 


i 

2- n=4; 
3- invest_required = [7;10;6;3] ; 
4 npv = [9;11;7;4] ; 

$- cvx_ begin 

6 - evx_solver gurobi 

7 variable x(n) binary ; 

8 maximize (npv'*x) 

9 invest_required'*x <= 19 ; 
0 cvx_end 


Figure 8.2 MATLAB CVX code for capital budgeting model 


Using either of these solvers we obtain the optimal solution: 


* 
a) 
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Relaxations and Duality 


A relaxation of an optimization model 


min f(x) 
x 
st. xen 
is another optimization model 
min f(x) 
x ~ 
st. xen 


that satisfies ¥ C Æ and f(x) < f(x) for x € X. In other words, a relaxation is 
a less stringent optimization model obtained by “relaxing” some of the original 
constraints and “relaxing” its objective function. Relaxation plays a central role 
in a variety of algorithms for solving mixed integer programs. We next describe 
two widely used types of relaxations for mixed integer programming, namely 
linear programming and Lagrangian relaxations. 


Linear Programming Relaxation 


The linear programming relaxation of the mixed integer linear program (8.1) is 
the linear program obtained by dropping the integrality constraints; that is, 


min c!x 
s.t. Ax=b (8.2) 
Dx > d. 


Similarly, the linear programming relaxation of a mixed binary linear program 
T 


min cx 
Dx > d 


z; € {0,1}, j EJ 


is the linear program 


min c'x 
Dx >d 


O<a;<ljeJd. 


The proof of the following proposition is straightforward and we leave it as an 
exercise. 


Proposition 8.3 Consider the mixed integer program (8.1) and its linear 
programming relaxation (8.2). Then the following facts hold. 


(a) The optimal value of the relaxation (8.2) is less than or equal to the optimal 
value of the mixed integer linear program (8.1). 
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(b) If the relaxation (8.2) is infeasible, then so is the mixed integer linear pro- 
gram (8.1). 

(c) If the optimal solution x* of the relaxation (8.2) satisfies x; € Z for j € J 
then x* is also an optimal solution to the mixed integer linear program (8.1). 


The analogous facts also hold for the mixed binary linear program (8.3) and 
its linear relaxation (8.4). Proposition 8.3 suggests a possible avenue for solving 
(8.1): solve the (more tractable) linear programming relaxation (8.2). If this 
solution satisfies the relevant integrality constraints, then we have solved (8.1). 
If not, the lower bound obtained by solving (8.2) provides valuable information. 
For instance, if we can find a feasible solution to (8.1), then the quality of this 
solution can be assessed by comparing it with the lower bound obtained from 
solving (8.2). The sections below elaborate on this idea. In particular, Section 8.4 
sketches algorithms that solve mixed linear integer programs by systematically 
solving a sequence of linear programming relaxations. 

Proposition 8.3 also leads to the following somewhat counterintuitive conclu- 
sion about integer programming formulations. As noted in Example 8.2, there 
could be several correct and thus equivalent integer linear programming formula- 
tions to a given problem. Among them, it is generally better to have a formulation 
with a “tight” linear programming relaxation. Typically, a formulation with more 
constraints has a tighter linear programming relaxation and hence might be easier 
to solve. 


Lagrangian Relaxation 


The Lagrangian framework discussed in previous chapters can be extended to 
obtain relaxations of an optimization model. The intuitive idea is to obtain a 
relaxation of a model by shifting a set of “difficult” constraints to the objective. 
To be more precise, consider an optimization problem of the form 


min c!x 
s.t. Ax=b (8.5) 
xXxEX, 


where the combined set of constraints Ax = b, x € X is “difficult” but the set of 
constraints x € ¥ is “easy”. (We will discuss a concrete example of this situation 
in Section 8.3.3.) A relaxation for (8.5) can be obtained as follows. Assume u is 
a vector of suitable dimension and consider the following problem without the 
difficult constraints: 


L(u):=min c'x+u!(b-— Ax) 


8.6 
st. XEŽX. (85) 


The problem (8.6) is a Lagrangian relaxation of (8.5). The following proposition 
is in the same spirit as Proposition 8.3. Again, its proof is straightforward and 
we leave it as an exercise. 
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Proposition 8.4 Consider the optimization problem (8.5) and its Lagrangian 
relaxation (8.6) for some vector u. Then the following facts hold. 


(a) The optimal value L(u) of the relaxation (8.6) is less than or equal to the 
optimal value of (8.5). 

(b) If the optimal solution x* of the relaxation (8.6) satisfies Ax* = b then x* 
is also an optimal solution to (8.5). 


The Lagrangian dual of (8.5) is the problem of finding the best Lagrangian 
relaxation 


max L(u), 


where L(u) is the optimal value of (8.6). We note that the function u > L(u) 
is concave; that is, u œ> —L(u) is convex. Thus the Lagrangian dual is a convex 
optimization problem. 

The Lagrangian relaxation and Lagrangian dual also extend to problems where 
the set of difficult constraints involves both equalities and inequalities. We simply 
need to be a bit careful about the sign of the multipliers for the inequality 
constraints. Consider the optimization problem 


min c!x 
Dx >d 
xen. 


Given vectors u,v of suitable dimension with v > O we obtain the following 
Lagrangian relaxation: 


L(u,v):=min c'x+u!(b-— Ax) +v!'(d— Dx) 


8.8 
st. XEŽX. eg 


We have the following extended version of Proposition 8.4. 


Proposition 8.5 Consider the optimization problem (8.7) and its Lagrangian 
relaxation (8.8) for some vectors u,v with v > 0. Then the following facts hold. 


(a) The optimal value L(u,v) of the relaxation (8.8) is less than or equal to the 
optimal value of (8.7). 

(b) If the optimal solution x* of the relaxation (8.8) satisfies Ax* = b, Dx* > d, 
and v'(Dx* — d) = 0, then x* is also an optimal solution to (8.7). 


The Lagrangian dual of (8.7) is 
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A Heuristic based on Lagrangian Relaxation for Clustering 


We next describe a particularly successful application of Lagrangian relaxation 
for the clustering problem introduced in Example 8.2, namely, 


X ryj=1 fori=1,...,N 


Lig Yj for i,j =1,...,N 
Zij, Yj E {0,1} for i,j = 1,...,N. 

The above model can be solved by general-purpose solvers such as Excel 
Solver or CVX, or even commercial solvers like Gurobi or CPLEX, only for rela- 
tively small values of N. One of the main difficulties is that the model involves 
N? + N binary variables and N? + N + 1 constraints. For a modest value of 
N like N = 100, the model becomes unmanageable if tackled by a standard 
solver. At the same time, in practical clustering problem instances N can easily 
range in the hundreds or thousands. A heuristic based on Lagrangian relaxation 
developed by Cornuéjols et al. (1977) can compute approximate solutions to (8.9) 
for virtually unlimited values of N. We next describe the main ideas behind this 
heuristic. Consider the following Lagrangian relaxation to (8.9): given a vector 


of multipliers u = [u e un] let 
N N 
L(u ) = max pe PijTij E 
y i= = = 
s.t. dot (8.10) 
=1 


R i,j =1,..., N 
Bizy = 0 or l, 1,7 =1,...,N. 


This Lagrangian relaxation has moved the “difficult” constraints DA Tij = 
1, with 7 = 1,...,N, to the objective via the multipliers u and has kept the 
remaining constraints as the “easy” ones. This Lagrangian relaxation satisfies 
the following key properties: 


Property 1: L(u) > Z, where Z is the optimal value of (8.9). This is an 
immediate consequence of Proposition 8.3. 
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Property 2: For a given u, (8.10) is easy to solve. To see this, first notice that 


N N 
L(u) := max 5 > (Pis — ui)£ij + >. Ui 


< Yj, i,j =1,...,N 
zij yj E€ {0,1}, tp =1,..., N. 


Given y, this shows that x;; should be set to its upper bound y; or to its 


lower bound 0, depending on whether the objective coefficient pij — ui 
of zij is positive or negative. Therefore L(u) can be rewritten as 


N N 
u) = max X Ciy +) ou 
= i=1 
N 
s.t. 5 Yj = K 


yj € {0,1}, j=1,...,N 


(8.11) 


for Cj := pau max(0, pij — ui). Finally, observe that the solution to 
(8.11) is ier computable: Sort the C; in decreasing order, say Cj, > 
Cj, > +++ > Czy. The optimal solution to (8.11) is obtained by setting 
Vin. = = Ue = 1 and the remaining g;s to zero. We get L(u) = 
2 E +E 1 “i: 

Property 3: Based on the optimal solution y of L(u) obtained in Property 2, 
one can get a heuristic (ad hoc) solution (x,y) for (8.9) and an assess- 
ment of how good it is. 


e Each u gives the upper bound L(u) > Z and the following heuristic 
feasible solution X to (8.9). Let y solve (8.11). Next, for each 
i = 1,..., N, assign 7 to the most similar centroid among the K 
centroids such that yj; = 1. That is, let j(i) = argmax;.,,_ pij and 
let x be as follows: 


z T if j = j(i) 


mE : 
e 0 otherwise. 


e If >., ;pijZij and L(u) are close to each other, then we have a near- 
optimal solution. To see this, observe that arr, Pijliy < Z < Lu). 
Thus if X`, . piji; and L(u) are close to each other, they must be 

1,9 Pig rij 
close to the optimal value Z as well. 
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e To get the best upper bound L(u) together with a heuristic solution 
of the above kind, solve 


min L(u). 


u 


This turns out to be a manageable convex optimization problem. 
In particular, it is amenable to a subgradient algorithm that we 
describe in Chapter 20. 


Algorithms for Solving Mixed Integer Programs 


The modeling power of mixed integer programming comes with some cost. Mixed 
integer programming belongs to the class of NP-hard computational problems 
(Conforti et al., 2014). In layman’s terms, this means that, unlike convex opti- 
mization problems, which can be solved with fast and reliable numerical algo- 
rithms, the same cannot be expected for mixed integer programs. The algorithms 
that we describe next can in principle solve any mixed integer linear programs in 
finitely many steps. However, the NP-hardness of integer programming implies 
that for some problem instances the computational cost incurred by these algo- 
rithms could be insurmountable even for any foreseeable amount of computa- 
tional power. 

The two most popular generic methods for solving mixed integer linear 
programs are cutting planes and branch and bound. Both of these methods rely 
extensively on linear programming relaxations. A cutting plane is a new linear 
constraint to the linear programming relaxation that “cuts off” non-integer 
solutions without cutting off any feasible solution of the original mixed integer 
linear program. The method of cutting planes was proposed by Dantzig et 
al. (1954) in the context of the traveling-salesman problem, and by Gomory 
(1958, 1960) for pure integer linear programs and mixed integer linear programs, 
respectively. The method is based on solving a sequence of increasingly tighter 
linear programming relaxations by adding cutting planes until a solution to the 
mixed integer linear program is found. On the other hand, Land and Doig (1960) 
proposed a “branch-and-bound” method to solve mixed integer linear programs. 
Branch and bound is an enumerative procedure based on dividing the original 
problem into a number of smaller problems (branching) and evaluating their 
quality based on their linear programming relaxations (bounding). Branch and 
bound was the most effective technique for solving mixed integer linear programs 
for multiple decades. However, in the 1990s, cutting planes made a resurgence. 
Current state-of-the-art integer programming solvers combine cutting planes 
and branch and bound into an overall procedure called “branch and cut”, a term 
coined in Padberg and Rinaldi (1987). 
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8.4.1 Branch-and-Bound Method 


The gist of the branch-and-bound method can be easily grasped via a couple of 
examples. Consider the following integer linear program (see Figure 8.3): 


Tı 


35 


max zı +2 

s.t. —z1ı +29 <2 
8zı + 2zə < 19 (8.12) 
T1, T2 = 0 
z1, T2 E Z. 


E re r+ 29 


1S T2 


Figure 8.3 A two-variable integer program 


Step 1. Solve the linear programming relaxation of (8.12), namely 


max 2+2%2 

s.t. —zı +z2<2 
8x1 + 2z < 19 
z1, £2 => 0. 


The solution is X = [1.5 a5)" with objective value 5. Thus 5 is an 


upper bound on the optimal value of (8.12). The vector x = [1.5 3.5] 7 
is not a feasible solution to (8.12) since the entries of X are fractional. 
How can we exclude this fractional solution while preserving the feasible 


integral solutions? One way is to branch: create two new linear programs, 
one with the additional constraint xı < 1, and the other with the 
additional constraint xı > 2. Clearly, any solution to the integer program 
must be feasible to one or the other of these two problems. We will solve 
both of these linear programs. 
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Step 2. Solve the first of the two new linear programs: 


max zı +Tə2 

s.t. —gz1ı +29 <2 
821 + 2z < 19 
zı < 1 
T1, T2 > 0. 


The solution is X = [1 3]" with objective value 4. This is a feasible 
integral solution to (8.12). So now we have the upper bound 5 and the 
lower bound 4 on the optimal value of (8.12). 

Step 3. Solve the second new linear program: 


max 2%1+2%2 

s.t. —z1 +%2 <2 
8a, + 2% < 19 
zı Pa 2 
T1, T2 > 0. 


The solution is x = [2 1.5]" with objective value 3.5. Because this 
value is worse than the lower bound of 4 that we already have, we do not 
need any further branching. We conclude that the vector X = [1 3]" 
with objective value 4 found in Step 2 is an optimal solution to (8.12). 


The solution of the above integer program by branch and bound required the 
solution of three linear programs. These problems can be arranged in a branch- 
and-bound tree, see Figure 8.4. Each node of the tree corresponds to one of the 
problems that were solved. 


t = 1.5, % = 3.5 
z=5 
zı <1 tr 22 
%=1,%=3 t= 2, r= 15 
z=4 z=3.5 
Prune by integrality Prune by bounds 


Figure 8.4 Branch-and-bound tree for (8.12) 


We can stop the enumeration at a node of the branch-and-bound tree for three 
different reasons (when they occur, the node is said to be pruned). 


Pruning by integrality occurs when the corresponding linear program has an 
optimal solution that is integral. This occurred in Step 2 in the above 
example. 
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Pruning by bounds occurs when the objective value of the linear program at 
that node is worse than the value of the best feasible solution found so 
far. This occurred in Step 3 in the above example. 

Pruning by infeasibility occurs when the linear program at that node is infea- 
sible. This did not occur in any of the steps in the above example. 


We next illustrate the branch-and-bound method is a slightly modified instance 
that leads to a larger branch-and-bound tree. Consider the integer linear pro- 
gram: 


max 321+ 22 

s.t. —z1ı +29 <2 
8x1 T 222 SS 19 (8.13) 
T1, T2 > 0 
z1, £2 E€ Z. 


Figure 8.5 depicts the branch-and-bound tree for this problem. 


Ti = 1.5, w= 3.5 


z=8 
Tı <1 Tı 22 
Tı = 1, T = 3 Ti = 2, T = 1.5 
z=6 z=7.5 
Prune by integrality tə <1 ty >2 
ty = 2.125, T2 = 1 
z= 7.375 Infeasible 
tı <? t123 Prune 
ti = 2 T} = 1 
Infeasible 
z= 
Prune by integrality Prune 


Figure 8.5 Branch-and-bound tree for (8.13) 


Algorithm 8.1 sketches the branch-and-bound method for a general mixed 
integer linear program of the form (8.1). The branch-and-bound method keeps a 
list of linear programming problems obtained by relaxing the integrality require- 
ments on the variables and imposing constraints such as 7; < uj or xj > lj. Each 
such linear program corresponds to a node of the branch-and-bound tree. It will 
be convenient to let N; denote both a node and its corresponding linear program 
in the branch-and-bound tree. Let xê and z; denote respectively the optimal 
solution and optimal value of the linear program N; with the convention z; = oo 
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if N; is infeasible. Let No denote the root node of the branch-and-bound tree: it 
corresponds to the linear programming relaxation (8.2) of (8.1). Throughout the 
algorithm we let £ denote the list of nodes that must still be solved. These are 
the nodes that have not been pruned nor branched on. Throughout the algorithm 
x* denotes the best feasible solution found so far and zy its objective value. The 
value zy is also the best upper bound on the optimal value z of (8.1) so far. 
Initially, the bound zy can be derived from a heuristic solution to (8.1), or it 
can be set to +00 if no heuristic solution is available. 


Algorithm 8.1 Branch-and-bound method 

: L:= {No}, zu := +00, x* := Q (initialization) 

if £ = Ú then HALT and return the vector x* end if (termination) 
choose and delete a node N; from £ and solve it (select next node to solve) 
if z; > zy then go to step 2 end if (prune N;) 

if x’ is feasible for (8.1) then (update upper bound and prune) 


ZS 23a x 
delete from £ all nodes Nz with zk > zy 
go to step 2 
6: else (branch from N;) 
choose j € J such that zi ¢ Z 
branch on variable zj, that is, construct two new linear programs 
N}: add the new constraint x; < | z% | to N; 
N?: add the new constraint x; > [z4] to N; 
add N}, N? to £ and go to step 2 
7: end if 


There are a variety of strategies for node selection (step 3) and for branching 
(step 6). Even more important to the success of branch and bound is the ability 
to prune the tree (steps 4 and 5). This will occur when zy is a good upper 
bound and when z; is a good lower bound. For this reason, it is crucial to have a 
formulation of (8.1) whose linear programming relaxation has an optimal value 
zip as Close as possible to the optimal value z of (8.1). 


Cutting-Plane Method 


A valid inequality for a mixed integer linear program is a linear inequality 
that is satisfied by all feasible solutions. A cutting plane of a mixed integer 
linear program is a valid inequality that cuts off some solutions to its linear 
programming relaxation. 

As we noted in Proposition 8.3, if an optimal solution of the linear program- 
ming relaxation satisfies the integrality constraints of a mixed integer linear 
program, then it is an optimal solution to the mixed integer linear program. 
The gist of cutting-plane methods is the observation that, when the latter does 
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not occur, the linear programming relaxation can be strengthened by adding a 
cutting plane that cuts off its optimal solution. 

Gomory (1960) proposed the following approach for solving mixed integer 
linear programs. Assume the variables in the problem are non-negative and 
satisfy the equality constraint 


5 aj£j + 5 ajj = b. (8.14) 
jEJ jgI 


Assume that b is not an integer and let fo be its fractional part, i.e. b = |b] + fo, 
where 0 < fo < 1. For j € J, let aj = |a;| + fj, where 0 < fj < 1. Replacing in 
(8.14) and moving sums of integer products to the right, we get 


5 fizi + 5 (fj — 1z; +) ajz = k+ fo, 
JEJ: fy <fo jEJ: fi>fo jg 


where k is some integer. Using the fact that k < —1 or k > 0, we must have 
either 


fi oa aj 
Tet gee ee A 


JEJ: fifo JEJ: fj>fo JET 


or 


r 1 — f. a; 

SR tt Gee, 
i F <h T ET fo eo 
JET: fiS fo JET: fi> fo IZI 

This is of the form B Cj£j > lor }; djxj > 1, which implies ve max(c;,d;)x; > 

1 because the variables x; are non-negative. 

Which is the larger of the two coefficients c; and dj in our case? The answer 
is easy since one coefficient is positive and the other is negative for each variable 
£j. Therefore, we get 


fj Lowe . 
5 prt 5 Toft t 5 Saj DD aa 


jEJ: fi< fo jEJ: f5>fo jgJ: aj>0 
(8.15) 


Inequality (8.15) is valid for all x > 0 that satisfy (8.14) with x; integer for 
j € J. It is called a Gomory mixed integer cut. 

We illustrate the use of Gomory cuts on problem (8.12). To that end, we first 
add slack variables x3 and x4 to turn the inequality constraints into equalities: 


max 21+ T2 

s.t. —gz1 + T2 + T3 = 2 
821 + 2%2 + x4 = 19 
£1, T2, T3, T4 > 0 
£1, £2 E€ Z. 


Solving the linear programming relaxation by the simplex method we get the 
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optimal basis B = {1,2} and so the constraints of the linear programming 
relaxation can be written as 


H Oe ia 0.223 TF 0.1z4 = 1.5 
£2 + 0.8z3 + 0.la4 = 3.5 
T1, T2, T3, T4 = 0. 


The corresponding basic solution is 73 = 24 = 0, zı = 1.5, z2 = 3.5 with 
objective value z = 5. This solution is not integer. Let us generate the Gomory 
cut corresponding to the equation 


zı — 0.2%3 + 0.lz4 = 1.5. 


We have fo = 0.5, fi = fo = 0, ag = —0.2 and a4 = 0.1. Applying formula 
(8.15), we get the Gomory cut 


i.e., 2%3+ 24> 5. 


Since 73 = 2+ £1 — £2 and x4 = 19—2 1 — 242, we can express the above Gomory 
cut in terms of 71, £2: 


32, + 2% < 9. 


3.5 


15 


Figure 8.6 Formulation strengthened by a cut 


Adding this cut to the linear programming relaxation, we get the following 
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strengthened linear programming relaxation (see Figure 8.6): 


max 2+ 22 

s.t. -—@,+%2 <2 
8x1 T 225 < 19 
321 ia 225 < 9 
T1, T2 = 0. 


The optimal solution to this linear program is x; = 1, £2 = 3 with objective 
value z = 4. Since xı and 22 are integer, this is the optimal solution to (8.12). 


Exercises 


Exercise 8.1 As the leader of an oil exploration drilling venture, you must 
determine the best selection of four out of eight possible sites. Label the sites 
S1, 52,-..., 8g and the expected profits associated with each as p1, p2,..., ps- 


(a) Ifsite s3 is explored, then sites sı and s2 must also be explored. Furthermore, 
regional development restrictions are such that 

(b) exploring sites sg and sy will prevent you from exploring site ss; 

(c) exploring at least one of the sites s3 or s4 will prevent you from exploring 
site S5. 


The eight expected profits are p; = i for i = 1,...,8. Formulate an integer 
program to determine the best exploration scheme and solve numerically with 
Solver. 


Exercise 8.2 Consider the following projects for possible investments. For each 
project, you are given the NPV as well as the cash outflows required during each 
year (in millions of dollars). 


NPV Year 1 Year 2 Year 3 Year 4 


Project 1 30 12 4 4 0 
Project 2 30 0 12 4 4 
Project 3 20 3 4 4 4 
Project 4 15 10 0 0 0 
Project 5 15 0 11 0 0 
Project 6 15 0 0 12 0 
Project 7 15 0 0 0 13 
Project 8 24 8 8 0 0 
Project 9 18 0 0 10 4 
Project 10 18 0 0 4 10 


No partial investment is allowed in any of these projects. The firm has 18 million 
dollars available for investment each year. 
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(a) Formulate an integer linear program to determine the best investment plan. 
(b) Formulate the following conditions as linear constraints. 
(i) Exactly one of the projects 4, 5, 6, 7 must be invested in. 
(ii) If Project 1 is invested in, then Project 2 cannot be invested in. 
(iii) If Project 3 is invested in, then Project 4 must also be invested in. 
(iv) If Project 8 is invested in, then either Project 9 or Project 10 or both 
must also be invested in. 
(v) If either Project 1 or Project 2 is invested in, then neither Project 9 
nor Project 10 can be invested in. 


Exercise 8.3 Consider the problem 


max 202%,+ 10%. + 10x3 

s.t. 2a, + 20% + 443 < 15 
6x1 + 20% + 4x3 = 20 
£1, Z2, £3 > 0 


£1, £2, £3 E Z. 


Solve its linear programming relaxation. Then, show that it is impossible to 
obtain a feasible integral solution by rounding the values of the variables. 


Exercise 8.4 
(a) Compare the feasible solutions of the following three integer linear programs: 


max 14%; + 8x2 + 6x3 + 6274 
s.t. 284%, + 15x2 + 13823 + 12x4 < 39 (i) 
£1, £2, £3, £4 E {0, 1}, 


max 14g + 8% + 6x3 + 6x4 
s.t. 2241 + £2 + £3 + T4 < 2 (ii) 
T1, T2, T3, T4 E {0, 1}, 


max 147, + 8x2 + 6x3 + 674 
s.t. 221 + £2 + £3 + T4 <2 


zı +#z2 <1 ses 
(iii) 

x£zı +3 <1 

xı +z4 <1 


£1, £2, £3, £4 E {0, 1}. 


(b) Compare the relaxations of the above integer linear programs obtained by 
replacing x1, £2, £3, £4 € {0,1} by 0 < zj < 1 for j = 1,...,4. Which is the 
best formulation among (i), (ii), (iii) for obtaining a tight bound from the 
linear programming relaxation? 


Exercise 8.5 Prove Proposition 8.3. 


Exercise 8.6 Prove Proposition 8.4. 
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Exercise 8.7 Prove that the function u > L(u) defined in (8.6) is a concave 
function. 


Exercise 8.8 Prove Proposition 8.5. 


Exercise 8.9 Let zzp denote the value of the linear programming relaxation 
(8.2) and let zzp be the Lagrangian dual of the following Lagrangian relaxation 
of (8.1): 
L(u) := min c!x+u!l(b— Ax) 
st. Dx>d 
rj, EZ, j EJ. 


Prove that ZLP < 2D: 


Exercise 8.10 Use the branch-and-bound method to solve the binary linear 
programming model: 


max 8zı + 1l1lzz + 6x3 + 4r4 

s.t. 6.7%, + 10x2 + 5.543 + 3.404 < 19 
8x1 + 2% < 19 
z1, 22,23, £4 € {0,1}. 


Compare the number of nodes in the branch-and-bound tree with the following 
naive brute-force enumeration approach: check each of the 24 = 16 possible 
values of x = [z1 £2 T3 za] with z; € {0,1}, for i =1,...,4, and pick the 
best feasible solution among them. 


Exercise 8.11 Solve the integer linear programs of Exercise 8.4 using your 
favorite solver. In each case, report the number of nodes in the enumeration 
tree. Is it related to the tightness of the linear programming relaxation studied 
in Exercise 8.4(b)? 


Exercise 8.12 Modify the branch-and-bound method (Algorithm 8.1) so that 
it stops as soon as it has found a feasible solution that is guaranteed to be within 
5% of the optimum. 


Exercise 8.13 Consider the integer program 


max 10v + 13x2 

s.t. 102, T 14279 < 43 
T1, T2 > 0 
z1, £2 E€ Z. 


(a) Introduce a slack variable and solve the linear programming relaxation by 
the simplex method. 
Hint: You should find the following optimal tableau: 


max £2 + T3 
s.t. Ly + 1.42x9 + 0.123 = 4.3 
£1, %2, £3 > 0 
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with basic solution zı = 4.3, £2 = £3 = 0. 

(b) Generate a Gomory mixed integer (GMI) cut that cuts off this solution. 

(c) Multiply both sides of the equation zı + 1.472 + 0.la3 = 4.3 by the constant 
k = 2 and generate the corresponding GMI cut. Repeat for k = 3,4 and 5. 
Compare the five GMI cuts that you found. 

(d) Add the GMI cut generated for k = 3 to the linear programming relaxation. 
Solve the resulting linear program by the simplex method. What is the 
optimum solution of the integer program? 


9.1 


Mixed Integer Programming 
Models: Portfolios with 
Combinatorial Constraints 


This chapter presents several applications of integer and mixed integer pro- 
gramming, namely combinatorial auctions, the lockbox problem, constructing an 
index fund, and portfolio optimization with cardinality and threshold constraints. 
All of these applications involve combinatorial features that can be modeled via 
binary variables. 


Combinatorial Auctions 


A combinatorial auction is an auction that involves the concurrent sale of multi- 
ple items. Examples include Federal Communications Commission (FCC) spec- 
trum auctions, electricity markets, pollution right auctions, and auctions for 
airport landing slots. In these kinds of auctions, bidders have preferences for 
sets of items usually called bundles. The value that a bidder has for a bundle 
may not necessarily be equal to the sum of the values that the bidder has for 
individual items in the bundle. To take the bidders’ preferences into consider- 
ation, combinatorial auctions allow bidders to submit bids on combinations of 
items. 

Specifically, let M be the set of items that the auctioneer has to sell and N 
the set of bidders. A bid is a pair (S,b;(S)) where S C M, for j € N, and b;(S) 
is the price that bidder j is willing to pay for the bundle S. The combinatorial 
auction problem or winner selection problem is the problem of identifying which 
bids should be accepted to maximize the auctioneer’s revenue. This problem can 
be formulated as a binary linear program. 


Binary linear programming model for the combinatorial auction problem 
Variables: 


aay = 1 if bundle S is allocated to bidder j 
47) 0 otherwise 


for SC M,j EN. 


Objective: 


max X` S°b;($)2(S, j). 


SCM jEN 
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Constraints: 


X 2(S, 7) <1 fori € M 
SCM:ieS jEN (allocated bundles do not overlap). 
x(S, j) € {10,1} forSCM,jEN 
(binary variables). 


In some combinatorial auctions, bidders are awarded at most one of the bundles 
that they bid on, even when these bundles are disjoint. This is easy to model by 
adding the constraint 


5 x(S, j) <1 forj € N (each bidder receives at most one bundle). 
SCM 


For example, if there are four items for sale and the following bids have been 
received: Bı = ({1},6), Bo = ({2},3), Bs = ({3,4},12), Ba = ({1,3},12), 
B; = ({2,4},8), Be = ({1,3,4},16), the winners can be determined by the 
following integer program: 


max 62, + 3xz2 + 12%3 + 12%4 + 8x5 + 16x6 


s.t. 27 +a4+2%6 <1 (item 1 is allocated at most once) 
zə +z5 <1 (item 2 is allocated at most once) 
z3 +z4 +z <1 (item 3 is allocated at most once) 
z3 + £5 +z <1 (item 4 is allocated at most once) 


xj € {0,1} for j = 1,...,6. 


If bids B4 and Bs come from the same bidder who wants at most one of these 
two bundles, it suffices to add the constraint 


tata5 <1. 


If there are multiple units u; of each item 7 € M, then a bid can be more 
broadly defined as a pair (A,0;(A)) where À is an M-vector with entries À; € 
{0,1,...,us}, i E€ M, that indicates the desired number of units À; of each item 
i E€ M. Let A denote the set of all these M-vectors. The previous model is 


replaced by 
max x 5 bi(A)x(A, j) 
ACA JEN 
s.t. 5 N. Aiz (à, j) < u; fori € M 
ACA JEN 
r(A, j) € {0,1} for ACA, FEN. 


There are further variations of the above formulations that incorporate additional 
features such as constraints on the kinds of bids the auctioneer accepts and 
constraints on the kinds of bundles that can be allocated to bidders (De Vries 
and Vohra, 2003). The gist of these models is essentially the same as those 
discussed above. 
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The Lockbox Problem 


Consider a national firm that receives checks from all over the United States. Due 
to the vagaries of the Postal Service, as well as the banking system, there is a vari- 
able delay from when the check is postmarked (and hence the customer has met 
her obligation) and when the check clears (and when the firm can use the money). 
For instance, a check mailed in Pittsburgh sent to a Pittsburgh address might 
clear in just two days. A similar check sent to Los Angeles (L.A.) might take four 
days to clear. It is in the interest of the firm to have the check clear as quickly as 
possible since then the firm can use the money. In order to speed up this clearing, 
firms open offices (called lockboxes) in different cities to handle the checks. 


Example 9.1 Suppose we receive payments from four regions (West, Mid- 
west, East, and South). The average daily value from each region is as follows: 
$300,000 from the West, $120,000 from the Midwest, $360,000 from the East, 
and $180,000 from the South. We are considering opening lockboxes in L.A., 
Cincinnati, Boston, and/or Houston. Operating a lockbox costs $90,000 per year. 
The average days from mailing to clearing is given in Table 9.1. Which lockboxes 
should we open? 


From L.A. Cincinnati Boston Houston 
West 2 4 6 6 
Midwest 4 2 5 5 
East 6 5 2 5 
South 7 5 6 3 


Table 9.1 Clearing times 


First we must calculate the losses due to lost interest for each possible assign- 
ment. For example, if the West sends to Boston, then on average there will be 
$1,800,000 (= 6 x $300, 000) in process on any given day. Assuming an investment 
rate of 10%, this corresponds to a yearly loss of $180,000. We can calculate the 
losses for the other possibilities in a similar fashion to get Table 9.2. 


From L.A. Cincinnati Boston Houston 
West 60 120 180 180 
Midwest 48 24 60 60 
East 216 180 72 180 
South 126 90 108 54 


Table 9.2 Lost interest (’000) 


The intuition for the formulation of the lockbox problem is similar to that of 
the formulation of the K-median model for clustering discussed in Chapter 8. 
We use a set of binary variables to model the lockboxes to open and another set 
of binary variables to model what lockbox serves each region. 
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Binary linear programming model for the lockbox problem 
Variables: 


1 if lockbox j is opened ; 
— for j =1,...,4. 
Ii { 0 otherwise ae 
= 1 if region i is served by lockbox j rae aa eee 
0 otherwise 
Objective: minimize total yearly costs 
4 4 
min 5 Cig Lig + 905° Yj, 
i, j=1 j=1 
where c;j is the (i, j) entry in Table 9.2. 
Constraints: 
N 
Mosua i for i = 1,...,4 
J=l (each region must be assigned to one lockbox) 
Lig ÊS Yj, for i, j =1,...,4 


(region i is assigned to lockbox j only if j is opened) 
Tij Yj E {0,1} fori,j=1,...,4 
(binary variables). 

As we observed for the binary programming formulation for clustering dis- 
cussed in Example 8.2, the above formulation would also be correct if we replaced 
the 4? = 16 constraints 

Tij < Yj, i,j — | ee 


with the four constraints 


4 

i=1 
However, we should note that the solution to the linear programming relaxation 
of the first formulation (with more constraints) is 


£11 = T21 = ©33 = £43 = Yı = Y3 = 1, and all other variables zero, 


which has binary entries and hence is an optimal solution to the binary linear 
programming model. Therefore the firm should open two lockboxes, one in the 
Eastern region and one in the West. 

By contrast, the solution to the linear programming relaxation of the second 
formulation (with fewer constraints) is 


£11 = T22 = T33 =T44 = l, Y = Y2 = Y3 = Ya = 0.25, 


and all other variables zero, 


which does not give any useful information about the binary linear programming 
model. 

This example highlights how different equivalent integer programming formu- 
lations can have very different properties with respect to their associated linear 
program. 
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Constructing an Index Fund 


An old and recurring debate about investing lies in the merits of active versus 
passive management of a portfolio. Active portfolio management tries to achieve 
superior performance by using technical and fundamental analysis. On the other 
hand, passive portfolio management relies entirely on diversification to achieve 
a desired performance. There are two types of passive management strategies: 
“buy and hold” or “indexing”. In the first one, assets are selected on the basis of 
some fundamental criteria and there is no active selling or buying of these stocks 
afterwards (see the chapters on dedication (Chapter 3) and portfolio optimization 
(Chapter 6)). In the second approach, absolutely no attempt is made to identify 
mispriced securities. The goal is to choose a portfolio that mirrors the movements 
of a broad market population or a market index. Such a portfolio is called an 
index fund. Given a target population of n stocks, one selects K stocks (and 
their weights in the index fund) to represent the target population as closely as 
possible. 

In the last 30 years, an increasing number of investors, both large and small, 
have established index funds. Simply defined, an index fund is a portfolio 
designed to track the movement of the market as a whole or some selected 
broad market segment. The rising popularity of index funds can be justified 
both theoretically and empirically. 


Market efficiency: If the market is efficient, no superior risk-adjusted returns 
can be achieved by stock picking strategies since the prices reflect all the 
information available in the marketplace. Additionally, since the market 
portfolio provides the best possible return per unit of risk, to the extent 
that it captures the efficiency of the market via diversification, one may 
argue that the best theoretical approach to fund management is to invest 
in an index fund. 

Empirical performance: Considerable empirical literature provides strong 
evidence that, on average, money managers have consistently under- 
performed the major indices. In addition, studies show that, in most 
cases, top performing funds for a year are no longer amongst the top 
performers in the following years, leaving room for the intervention of 
luck as an explanation for good performance. 

Transaction cost: Actively managed funds incur transaction costs, which 
reduce the overall performance of these funds. In addition, active 
management implies significant research costs. Finally, fund managers 
may have costly compensation packages that can be avoided to a large 
extent with index funds. 


Here we take the point of view of a fund manager who wants to construct an 
index fund. Strategies for forming index funds involve choosing a broad market 
index as a proxy for an entire market, e.g. the Standard & Poor’s list of 500 
stocks (S&P 500). A pure indexing approach consists in purchasing all the issues 
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in the index, with the same exact weights as in the index. In most instances, 
this approach is impractical (many small positions) and expensive (rebalancing 
costs may be incurred frequently). An index fund with K stocks, where K is 
substantially smaller than the size n of the target population, seems desirable. 
The clustering approach introduced in Chapter 8 can be used to aggregate the 
stocks in a broad market into a smaller more manageable index fund. This 
approach will not necessarily yield mean/variance-efficient portfolios but will 
produce a portfolio that closely replicates the underlying market population. 

We describe a two-step heuristic approach for constructing an index fund. 
First, select K stocks to be included in the portfolio. Second, determine weights 
for these stocks so that the portfolio is as close as possible to the benchmark. 
The motivation for this two-step approach is that each of the stocks selected in 
the portfolio is a proxy for a portion of stocks in the index. 

The first step, that is, the selection of the K stocks to be included in the 
portfolio, can be formulated as the binary linear programming formulation for 
clustering for Example 8.2 in Chapter 8. Recall that the model is based on the 
following data: 


pij = Similarity between stock i and stock j. 


An example of this is the correlation between the returns of stocks i and j. But 
one could choose other similarity measures p;j. 
Recall the binary linear programming formulation for the clustering problem: 


ape fori=1,...,n 


Lig S Yj for i,j =1,...,n 
Xij yj E {0,1} fori, j =1,...,n. 


As discussed in Chapter 8, the variables y; describe which stocks j are in the 
portfolio (y; = 1 if j is selected in the portfolio, 0 otherwise). For each stock 
i = 1,...,n, the variable x;; indicates which stock j in the portfolio is most 
similar to i (a;; = 1 if j is the most similar stock in the portfolio, 0 otherwise). 

Once the set of K stocks has been selected, a simple approach to the second 
step of portfolio construction is as follows. Assume j1,...,jxK are the selected 
stocks and Cy,...,Cx are the corresponding clusters. That is, Cy is the set of 
stocks represented by stock je for £ = 1,..., K. Set the weight of each selected 
stock je, £ = 1,...,K, proportional to the total market capitalization of the 
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stocks in Cy: 


5r 


_ tECe 


where V; is the market capitalization of stock i. 

The second step can alternatively be tackled via a linear or a quadratic 
programming model. The variables in the model are the portfolio weights on 
the selected stocks. A reasonable objective is to minimize a measure associated 
with the quality of tracking such as active risk — this would lead to a quadratic 
programming problem. Alternatively, one can minimize mean absolute deviation 
and obtain a linear programming problem. The constraints could include bounds 
on beta, sector exposures, and other attributes to find weights of the selected 
stocks so that the portfolio is as close as possible to the benchmark. 


Cardinality Constraints 


In this section, we present a different approach for tracking a basket of assets, e.g., 
an index, with a small group of stocks. In contrast to the two-step approach in the 
previous section, we model the index replication problem in one step, as a mixed 
integer programming problem with cardinality constraints. For concreteness, 
consider that case when we want to track a benchmark with a portfolio containing 
a predetermined maximum number of stocks. Assume xp is the vector of holdings 
in the benchmark, and x is the vector of holdings in the portfolio. Suppose we 
want to include at most K stocks in the tracking portfolio. If we have an estimate 
of the covariance matrix of the universe of stocks in the index, then the problem 
can be informally stated as follows: 


min (x-— xg)! V(x — xp) 
st. 1'x=1 
x >0 
x; >0 for at most K distinct j =1,...,n. 


(9.1) 


In order to model the problem formally, we introduce a new set of binary variables 
whose role is to model the logical condition of whether each particular stock is 
included in the portfolio: 


ae 1 if Tj >0 
Yi = 1 0 otherwise. 
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The problem (9.1) can be reformulated as the following mixed quadratic program: 


min (x—xg)V(x— xp) 
st. 1'x=1 


x>0 
yj E€ {0,1} for j=1,...,n. 


Observe that the linking constraint x; < y; in (9.2) is a mathematical way of 
encoding the logical connection between x; and y;: the variable x; is positive only 
when y; = 1. Furthermore, the constraint Xi yj < K enforces the condition 
that at most K of the x variables are positive. 

Consider now a more general mean-variance model with cardinality con- 
straints where we now allow short positions: 


min x'Vx 

s.t. p'x = ph 
Ax=b (9.3) 
Cx >d 
x; #0 for at most K distinct j =1,...,n. 


The above approach extends provided that there is a lower bound £; and an 
upper bound uj on the value of each holding x; for j = 1,...,n. In this case 
the cardinality constraint can be formulated via a new set of binary variables 
Yj, J =1,...,n, together with the linking constraints 


by; < xj < ujyj for FH 1,...,n. 
The problem (9.3) can be reformulated as the following mixed quadratic program: 
min x'Vx 
st. pox > i 
Ax=b 


Cx >d 
ljyj < zj < uzy; forj=1,...,n 


Xy <K 
j=1 


yj € {0,1} for 7 =1,...,n. 


Minimum Position Constraints 


The same kind of linking constraints jy; < x; < ujyj used in the mixed 
binary programming formulation (9.4) can be used to enforce another common 
practical consideration: minimum position constraints. Although diversification 
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into a broad universe of assets generally has merits, there is a potential downside: 
some positions may become very small. Too many small positions typically gen- 
erate higher research and monitoring costs. Consequently, investment managers 
enforce minimum position constraints. This means that if a stock 7 is included in 
the portfolio, then the holding x; in that stock must surpass a minimum thresh- 
old £; > 0. Consider a general mean-variance model with minimum position 
constraints: 
min x'Vx 
s.t. u'x = ph 
Ax=b (9.5) 
Cx >d 
>Y => «#28; for 7=1,...,n. 


Provided there is an upper bound uj on the value of each holding xj, for j = 
1,...,n, and proceeding as above, the problem (9.5) can be reformulated as the 
following mixed quadratic program: 


min x!Vx 
st. pix >i 


Ax=b 

Cx >d (9.6) 
£;yj <2; S ujyj forj=1,...,n 

yj € {0,1} for j =1,...,n. 


Risk-Parity Portfolios and Clustering 


Consider a situation where you are trying to construct a risk-parity portfolio as 
explained in Section 7.5. We would like to allocate equal risk to a set of assets, but 
several of them may be very similar. By using the risk-parity strategy directly, 
you end up overweighting the characteristics those assets share. Instead, you can 
cluster the assets first, and then allocate risk evenly to each cluster. This is more 
consistent with the spirit of “risk parity”. This first step can be accomplished 
using the clustering approach suggested in Example 8.2. 


Exercises 


Exercise 9.1 In a combinatorial exchange, both buyers and sellers can submit 
combinatorial bids. Bids are like in the multiple item case, except that the values 
A; can be negative, as can the prices p; (À), representing selling instead of buying. 
Note that a single bid can be buying some items while selling other items. 
Write an integer linear program that will maximize the surplus generated by 
the combinatorial exchange. 
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Exercise 9.2 You have $250,000 to invest in the following possible investments. 
The cash inflows/outflows are as follows: 


Year 1 Year2 Year3 Year 4 


Investment 1 —1.00 1.18 

Investment 2 —1.00 1.22 
Investment 3 —1.00 1.10 
Investment 4 —1.00 0.14 0.14 1.00 
Investment 5 —1.00 0.20 1.00 


For example, if you invest one dollar in Investment 1 at the beginning of 
Year 1, you receive $1.18 at the beginning of Year 3. If you invest in any of these 
investments, the required minimum level is $100,000 in each case. Any or all 
the available funds at the beginning of a year can be placed in a money market 
account that yields 3% per year. Formulate a mixed integer linear program to 
maximize the amount of money available at the beginning of Year 4. Solve the 
integer program using your favorite solver. 


Exercise 9.3 Consider a lockbox problem where c;j is the cost of assigning 
region 7 to a lockbox in region j, for i,j € {1,...,n}. Suppose that we wish to 
open exactly K lockboxes where K is a given integer, 1 < K <n. 


(a) Formulate as an integer linear program the problem of opening K lockboxes 
so as to minimize the total cost of assigning each region to an open lockbox. 

(b) Formulate in two different ways the constraint that regions cannot send 
checks to closed lockboxes. 

(c) For the following data 


k=2 and (cij) = 


Non A © 
Or worw 
e OW Oo 
Lk Or A CO 
CORN OWN 


7 


compare the linear programming relaxations of your two formulations in 
part (b). 


Exercise 9.4 You currently own a portfolio of eight stocks. Using the 
Markowitz model, you computed the optimal mean-variance portfolio. The 
weights of these two portfolios are shown in the following table: 


Stock A B C D E F G H 
Your portfolio 0.12 0.15 0.13 0.10 0.20 0.10 0.12 0.08 
M/V portfolio 0.02 0.05 0.25 0.06 0.18 0.10 0.22 0.12 


You would like to rebalance your portfolio in order to be closer to the mean- 
variance portfolio. To avoid excessively high transaction costs, you decide to 
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rebalance only three stocks from your portfolio. Let x; denote the weight of 
stock i in your rebalanced portfolio. The objective is to minimize the quantity 


which measures how closely the rebalanced portfolio matches the mean-variance 
portfolio. 

Formulate this problem as a mixed integer linear program. Note that you will 
need to introduce new continuous variables in order to linearize the absolute 
values and new binary variables in order to impose the constraint that only 
three stocks are traded. 


Case Study 


The goal of this case study is to construct a parsimonious fund that tracks a 
pre-specified market index. 


(1) Choose a stock market index (with at least 25 stocks) to be tracked. Some 
possible choices are the Dow Jones Industrial Average, the S&P 100, and 
the Nasdaq 100. If you feel ambitious, you may choose a larger index. 

Collect recent historical data over a meaningful horizon. Make sure you 
include more observations (ideally many more) than the number of stocks in 
the index. A reasonable choice is a few years (two or three) of weekly data, 
or a few more (six or seven) of monthly data. For larger indices, you may 
want to consider daily data. 

Use the first 70% of your data for calibrating your model; that is, for 
parameter estimation, choice of stocks, choice of weights, etc. Use the remain- 
ing 30% for out-of-sample testing. 

(2) Use some kind of clustering or variable selection approach to choose a small 
subset of stocks from the investment universe. Complete the process by 
assigning weights to the selected stocks using the following simple rule: 
Mimic the weighting method used in the index. For example, for a market- 
value-weighted index, assign the weight of each selected stock according to 
the market value of all the stocks that it represents. You can also attempt 
to replicate various attributes of the market index, such as exposure to 
particular sectors, industries, etc. 

(3) Compare the performance of the constructed fund and that of the actual 
stock market index. To this end, test the results of your model(s) on out-of- 
sample data. This is more interesting if done via a rolling-time window. 

(4) Construct index funds with different number of stocks. Compare their per- 
formance. 

(5) Study the effect of rebalancing the index fund by periodical adjustment of 
the weights of the selected stocks. Try different periods for rebalancing: one 
week, one month, etc. 
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(6) Propose some alternative models and compare their results with those of 
the basic model. You may want to consider combinations of the following 
possible variations. Be as creative as you wish: 

(i) Use a weighted objective function in the stock selection problem. 

(ii) Use a second optimization problem to assign the weights. For example, 
you can set the weights to minimize the tracking error (active variance) 
between the fund and the overall index. 

(iii) Match attributes of the index such as beta, and/or exposures to factors 
such as industries, sectors, etc. 
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10.1 


Stochastic Programming: 
Theory and Algorithms 


Stochastic optimization is concerned with optimizing decision variables under 
uncertainty. As an example, Markowitz’s mean-variance model can be seen as 
a stochastic optimization model. Stochastic optimization covers a wide class of 
models in a variety of disciplines. It is often associated with the terms dynamic 
programming, stochastic programming, and stochastic control, among others. We 
devote several chapters to this important and vast topic. This chapter concen- 
trates on single-period/two-stage models. This provides a foundation for more 
general multi-period/ multi-stage models that will be discussed in Chapters 12 
through 16. 


Examples of Stochastic Optimization Models 


The next three examples inherently involve making decisions under uncertainty. 


Example 10.1 (The newsvendor problem) A vendor purchases a particular 
commodity to satisfy some demand that occurs later over some time period. 
The demand D is random. The per-unit ordering cost, back-ordering cost, and 
holding costs are known to be c, p, and h, respectively. The total cost incurred 
by the vendor if he purchases x units and the demand turns out to be D is 


F(«,D)=c-x+p-max(D— 2,0) +h- max(x — D,0). 


The problem is to decide the order quantity x that minimizes the expected total 
cost E[F (x, D)]. 


Example 10.2 (Utility-based optimization) An investor with endowment Wo 
needs to decide how to invest this initial capital over a planning horizon. The 
investor’s preferences for her final wealth W are expressed via a concave utility 
function U(W). Assume r is the vector of random returns on the assets that the 
investor can purchase over the planning horizon. The investor wishes to choose 
a portfolio x € Æ that maximizes the expected utility E[U(W)] of her final 
wealth W: 


W =Wo(1+r'x). 
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Example 10.3 (Optimal consumption and investment) An individual may con- 
sume some portion Co of her initial endowment Wo now and invest the remaining 
capital Wo — Co for consumption at a future time. Assume r is the vector of 
random returns on the assets in which she can invest her remaining capital 
Wo — Co. Investing in a portfolio x will thus produce a random wealth W = 
(Wo — Co)(1+r'x). What should her consumption Co and investment decisions 
x € X be to maximize her total expected utility 


Uo(Co) + E[Ui(W)]. 


Two-Stage Stochastic Optimization 


Consider the following generic type of optimization problem under uncertainty. 
At time 0 we need to make a set of decisions x subject to some constraint set 
X. Between time 0 and time 1 a random outcome w is revealed. Our goal is to 
choose x to minimize the expectation of some objective function F(x,w) that 
depends both on x as well as on the random outcome w. This generic stochastic 
optimization problem has the following formal formulation: 


min E(F(x,w)) 
ee (10.1) 


The particular form of F'(x,w) may define various types of problems, as we saw 
in Examples 10.1, 10.2, and 10.3. The function F(x,w) could be more involved: 
In a problem with recourse the function F(x,w) depends on decisions that can 
be made after the uncertainty w is revealed. 

Stochastic optimization with recourse is a refinement of the generic formula- 
tion (10.1). In this class of problems a first set of decisions x must be made here 
and now at time 0. Between time 0 and time 1 a random outcome w occurs. 
Then at time 1 we have the opportunity to make a new round of wait-and-see 
decisions y(w) after the random w is revealed. This leads to a two-stage stochastic 
optimization with recourse problem formally stated as follows: 


min f(x) + EQ, w) 
xXxEÑ. 


(10.2) 


The recourse term Q(x, w) depends on the initial set of decisions x and on the 
random outcome w, and it is of the form 


Qu) = min gly(w)) 
ylw) E€ V(x, w). 


The set of decisions y(w) are the recourse decisions. They are adaptive to the 
random outcome w. This means that unlike x they are allowed to depend on w. 
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Example 10.4 (Newsvendor problem revisited) In this case the total cost is 
F(x, D) =c-x«+p-max(D— 2,0) +h: max(x — D,0). 


We want to solve 


min i|F'(x2, D)] = min (c-x +E [p- max(D — x,0) + h- max(x — D,0)]) 


where the recourse term Q(x, D) is 
Q(x,D):=min pyt+hz 
Y, z 
st. y> D-r 
z>a—-D 
y,z => 0. 
Note that here the recourse decisions y and z are easy to compute once the 
demand D and the number of units purchased x are known, namely y = (D— x)" 
and z = (x — D)*. 


Sometimes it is preferable to consider a more general model obtained by 
replacing the objective E(F(x,w)) in (10.1) with o(F(x,w)) where o(-) is a real- 
valued function. In particular, it is common to let o(-) be a risk measure as 
illustrated in the following example. We formally define and discuss risk measures 
in more detail in Chapter 11. 


Example 10.5 (Mean-—variance revisited) Let r denote the vector of random 
asset returns in an investment universe and let and V denote respectively the 
expected value and covariance matrix of r. The classic mean—variance model 


min iy -x'Vx— u'x 
x 
xE 
can be written as the stochastic optimization problem 
min o(r'x) 
x 
xE 


for the risk measure @ defined by 
o(Z) = 57-0°(Z) — E(Z). 


Linear Two-Stage Stochastic Programming 


A linear two-stage stochastic program is a problem of the form 


min c!x+E/[Q(x,w)| 
st. Ax=b 
x > 0, 
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where the recourse term Q(x,w) is the value of another linear program: 


Q(x) = min alu)" yw) 
s.t. T(w)x+ W(w)y(w) = h(w) 
y(w) > 0. 


Here the parameters q(w), T(w), W(w),h(w) are random, and w represents a 
random outcome w € Q that is revealed between stage 0 and stage 1. It is 
customary and convenient to think of w itself as the array of random parameters 
w = (q, T, W, h). The vector x represents the first-stage decisions. These must be 
made without knowing the random draw w. The vector y(w) denotes the second- 
stage decisions. These may depend on the random draw w. To ease notation, this 
type of problem is often written in the following form: 


min E[c'x+q'y] 

s.t. Ax=b 
Tx+ Wy=d (10.3) 
x>0 
y 20, 


but we should keep in mind that the tuple of uncertain parameters w = 
(q, T, W,h) is revealed between time 0 and time 1 and the recourse variables y 
may depend on this outcome. 


Scenario Optimization 


Scenario optimization is a computational approach to stochastic optimization. 
The gist of this approach is to assume a discrete distribution for the random 
outcome. More precisely, assume the set of possible random outcomes is a dis- 
crete probability space Q = {w1,...,wg}, with probability distribution pẹ = 
P(w,), k =1,...,S. The elements in 2 are the possible realizations or scenarios 


wk = (qk, Tk, We, hy) 


of the stochastic components of the model. 
Under this assumption, the stochastic optimization problem (10.3) can be 
written as the following deterministic equivalent: 


S 
: T T 
min c x +) Pr(asye) 
st. Ax=b (10.4) 
Tx + Weyr = hk for k=1,...,S9 
x>0 
yr > 0 for k=1,...,9. 


The deterministic equivalent problem has S copies of the second-stage decision 
variables and hence can be significantly larger than the original problem before 
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we considered the uncertainty of the parameters. Fortunately, the constraint 
matrix has a very special sparsity structure that can be exploited as we explain 
in Section 10.5 below. 


Example 10.6 (Newsvendor problem revisited) Suppose the demand D in the 
newsvendor problem has a discrete distribution. More precisely, suppose the 
scenarios for the demand D are D,,..., Ds and P(D = D;) = p; fori = 1,..., S. 
Hence the newsvendor problem min E[F (x, D)] where 


F(x, D) =c:-x+p-max(D— 2,0) + h: max(x — D,0) 


has the following deterministic equivalent: 


S S 


mini crtp: > py th: X pizi 
a k=1 k=1 


s.t. yi > Di- x, CS eg 


*The L-Shaped Method 


The constraint matrix of (10.4) has the following form: 


A 
T Wi 
Ts Ws 
Observe that the blocks W1,..., Ws of the constraint matrix are only interre- 


lated through the blocks T,,..., Ts which correspond to the first-stage decisions. 
In other words, once the first-stage decisions x have been fixed, (10.4) decomposes 
into S independent linear programs. The Benders decomposition method is an 
algorithm that takes advantage of this type of structure. This method is also 
called the L-shaped method in the stochastic programming literature. The idea 
behind this method is to solve a “master problem” involving only the variables x 
and a series of independent “recourse problems” each involving a different vector 
of variables yx. The master problem and recourse problems are linear programs. 
The size of these linear programs is much smaller than the size of the full model 
(10.4). The recourse problems are solved for a given vector x and their solutions 
are used to generate inequalities that are added to the master problem. Solving 
the new master problem produces a new x and the process is repeated. More 
specifically, let us write (10.4) as 
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min c'x+P,(x)+---+ Ps(x) 


st. Ax=b (10.5) 
x>0 
where 
P(x) = min PkALYk 
s.t.  Wkyk = hy — Tx (10.6) 
yr 20 


for k = 1,..., S. The dual of the linear program (10.6) is: 


P,(x) = max (h,—T;x)'u 
(x) = max (by ~ Tex)", T 
s.t. Wuk < Pkk.- 


For simplicity, assume (10.7) is feasible, which is the case of interest in many 
applications. The recourse linear program (10.6) will be solved for a sequence of 


vectors x’, for i = 0,1,2,.... The initial vector x° can be obtained by solving 
min c'x 
st. Ax=b (10.8) 
x > 0. 


For a given vector x’, two possibilities can occur for the recourse linear program 
(10.6): either (10.6) has an optimal solution or it is infeasible. 

If (10.6) has an optimal solution y$, and uj, is the corresponding optimal dual 
solution, then (10.7) implies that 


P(x) > (uż)! (Tx’ — Tx) + P, (xî). 


This inequality, which is called an optimality cut, can be added to the current 
master linear program. Initially, the master linear program is just (10.8). 

If (10.6) is infeasible, then the dual problem is unbounded. Let uj, be a 
direction where (10.7) is unbounded, that is, (hy, — Tpx’) Tu}, > 0 and W{ļ uż, < 
Prk. Since we are only interested in first-stage decisions x that lead to feasible 
second-stage decisions y,, the following feasibility cut can be added to the current 
master linear program: 


(uj,)' (hy, = Tx) < 0. 


After solving the recourse problems (10.6) for each k, we have the following 
upper bound on the optimal value of (10.4): 


UB = c' x’ + P(x’) +: + Po(x’), 


where we set P(x?) = +00 if the corresponding recourse problem is infeasible. 
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Adding all the optimality and feasibility cuts found so far (for j = 0,...,7) to 
the master linear program, we obtain: 


(ul) (Tx? — Tpx) + Py(x’) < zp for some pairs (j, k) 


(ui) (hy — Tx) <0 for the remaining pairs (j, k) 
x > 0. 
Denoting by x‘*1, z{*",...,2¢*" an optimal solution to this linear program, we 


get a lower bound on the optimal value of (10.4): 

LB =c'x t! 4 tthe pt. 
The Benders decomposition method alternately solves the recourse problems 
(10.6) and the master linear program with new optimality and feasibility cuts 
added at each iteration until the gap between the upper bound UB and the lower 
bound LB falls below a given threshold. It can shown that U B— LB converges to 


zero and indeed reaches zero after finitely many iterations. For details see Birge 
and Louveaux (1997, chapter 5). 


Exercises 


Exercise 10.1 Consider the utility-based portfolio optimization described in 
Example 10.2. Suppose r has a multivariate normal distribution and the investor 
has logarithmic utility function 


U(W) = log(W). 
Suppose is a convex set. Prove that the utility maximization problem 


max E(U(W)) 
st. W=Wo-(14+r'x) 
xE 


is equivalent to the mean-variance problem 
min iJ -x'Vx — u'x 
xE 


for some suitable level of risk aversion Ẹ. 


Exercise 10.2 Repeat the above exercise when the investor has power utility 


function 


1 
U(W) = Ta wi-7 


for some risk-aversion constant y > 0, y # 1. 
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Exercise 10.3 Consider the newsvendor problem described in Example 10.1. 
Suppose the demand D has a continuous cumulative distribution function ®; 
that is, #(x) = P(D < x). Show that the solution to the newsvendor problem 


min (c-x +E [p- max(D — 2,0) + h - max(x — D,0)]) 


Exercise 10.4 The purpose of this exercise is to formalize the optimality cut 
described in Section 10.5. For x = x’ assume (10.6) has an optimal solution yi, 
and let uj, be the corresponding optimal dual solution. 


(a) Show that P(x’) = (uż)! (hẹ — Tpx’). 
(b) Show that P(x) > (u,)" (hx — Tx) for all x. 
(c) Conclude that P(x) > (ui.)'(T,x’ — Thx) + Py(x’) for all x. 
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Stochastic Programming Models: 
Risk Measures 


This chapter discusses several popular risk measures. In particular, we introduce 
two widely used risk measures, value at risk and its refinement conditional 
value at risk. We show that the problem of finding a portfolio that minimizes 
conditional value at risk is amenable to stochastic programming techniques. 


Risk Measures 


In the classical Markowitz model, variance (equivalently standard deviation) is 
used as a measure of risk. This measure of risk is relatively easy to compute, and, 
as we have seen in Chapter 6, leads to a quadratic programming model when we 
are interested in finding efficient portfolios. 

As we illustrate below, variance has some shortcomings as a measure of risk. 
This has motivated the introduction of other risk measures. 


Dispersion Measures 
Let r denote the (random) return of an asset. The variance 
o? = var(r) = E((r — p2)?) 


is a measure of dispersion of the distribution of r. Another dispersion measure 
is the mean absolute deviation (MAD) favored by Konno and Yamazaki (1991): 


O(|r — yl). 


For the special case of normally distributed returns, the mean absolute deviation 


and the standard deviation are equivalent. Indeed, the following property is a 
straightforward exercise in probability: 


Proposition 11.1 [fr ~ N(u,07) then E(|r — u|) = \/2/r0. 


A major difference between mean absolute deviation and standard deviation 
is their sensitivity to outliers. The mean absolute deviation is more robust 
to outliers. When the distribution of joint returns is represented via a set of 
scenarios, the computation of efficient portfolios for the mean absolute deviation 
can be formulated as a linear program. This offers an alternative with potential 
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advantages as we will show next. Suppose the investment universe has n assets 


with (random) returns r1,72,...,%%. Let uj = E(r;), j =1,...,n. 


Recall the portfolio optimization problem that finds the minimum-variance 
portfolio among a set of portfolios ¥: 


min var(r'x) =E (i — n)™x]’) 


x 


st. XEŽX. 


Consider now the model obtained by using instead the mean absolute deviation 
as a measure of risk: 


min E (| — u)"x]) 


x 


(11.1) 
st. XEŽX. 


Not only does the computation of efficient portfolios based on formulation (11.1) 
involve solving a linear program as opposed to a quadratic program, but also 
the linear program solves the problem directly over the set of scenarios thereby 
circumventing the estimation of the covariance matrix. 


Mean Absolute Deviation via Scenario Optimization 


; ‘ T 
Assume the possible scenarios for the vector of returns r = [r1 tee Tn] are 
k k 
r = [rf ee rel 5° kS 
and scenario k occurs with probability pk, k =1,...,S. Then we can write the 


above mean absolute deviation model (11.1) as 


S 

ag X prwe 
k=1 

st. w,=(|(r*—p)'x| for k=1,...,8 
xeEXr., 


We now turn this formulation into a linear program as follows: 


S 


min y PkWk 
X,w 
k=1 


s.t. wk > (r? — p)'x for k=1,...,8 


wp >—(r®—yp)'x for k=1,...,S 
xEÑ. 
Note that, because px > 0 for k = 1,..., S and the objective is minimized, wę in 


an optimal solution satisfies at equality the constraint with the larger right-hand 
side, that is, wą = |(r* — u)" xl. 
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Downside Risk Measures 


Dispersion measures, such as variance and mean absolute deviation, measure the 
degree of uncertainty in the random return. These measures treat both positive 
and negative deviations from the mean as equally risky. In particular, these types 
of measures are blind to skewed distributions. 

We will next discuss two popular downside risk measures:value at risk and 
conditional value at risk. Value at risk (VaR) was first introduced by a team 
at J.P. Morgan and made available through RiskMetrics. VaR is used by many 
financial institutions to track and report the market risk exposure of their trading 
portfolios. 

VaR is a measure of the worst possible loss that a portfolio may sustain with 
a pre-specified likelihood. For that reason, VaR is generally measured in dollar 
terms, instead of percentage units. The formal definition is as follows. Assume 
that Y is a (random) loss function, and a € (0,1) is a confidence level (typically 
99%, 95%, or 90%). The a value at risk of Y is the (1 — a) quantile of Y; that 
is, the value y such that 


P(Y > 7) =1-a. 


We shall denote this value by VaRq(Y). 

The value at risk has the following interpretation. Given a loss function Y and 
a confidence level a € (0,1), the loss Y will exceed y with probability (1 — a). 
In the special case when the loss function is normally distributed, it is easy to 
compute VaR via well-known quantiles of the normal distribution. 


Example 11.2 If Y ~ N(u,07) then 
VaRo.95 (Y) = Ļu + 1.6450, VaRo.99(Y) =U + 2.330. 


When Y has a discrete distribution, VaR can be computed by sorting the 
values of Y as detailed in the following example. 


Example 11.3 Assume there are S possible scenarios for the loss Y: 
P(Y = yx) = Pk, k= Lit seta 


where 


Then 
VaRo (Y) = 4K; 


where K is the smallest index such that 


s 
X pi > 1-a. 
i=K 
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In spite of its wide popularity, VaR is known to have the following two major 
shortcomings (see the exercises at the end of this chapter): 


e VaR is not “subadditive”: The VaR of two positions combined may be greater 
than the sum of the VaR of each, meaning that diversification can actually 
increase VaR. 


e VaR does not distinguish loss size beyond the VaR threshold. 


These deficiencies of VaR led Artzner et al. (1999) to propose the following formal 
set of properties that a reasonable risk measure p(Y) of a loss function Y should 
satisfy: 


e Monotonicity: If Y > 0 then p(Y) > 0. 

e Subadditivity: p(Y + Z) < p(Y) + p(Z). 

e Positive homogeneity: For c > 0, p(cY) = cp(Y). 

e Translational invariance: For any c € R, p(Y +c) = p(Y) +c. 


A risk measure is coherent if it satisfies the above four properties. Neither 
standard deviation nor VaR are coherent. However, there is a modification of VaR 
that is coherent, namely the conditional value at risk introduced by Rockafellar 
and Uryasev (2000). Conditional value at risk (CVaR) is also known as expected 
tail loss. 

CVaR can be motivated as follows. Since VaRa(Y) is the most we can lose 
with probability a, it is equivalent to saying that with probability (1 — a) the 
loss Y will be at least VaR.(Y). CVaR is the answer to the following question: 
What should we expect the value of that loss to be? More precisely, CVaR is 
defined as follows. Given a loss function Y and confidence level a € (0,1), the 
conditional value at risk is the expected loss Y, conditional on this loss being at 
least VaRa(Y): 


(Y |Y > VaRa(Y)). 


We shall denote this expected value as CVaRa (Y). 

Again, in the special case when the loss function is normally distributed, it is 
easy to compute CVaR by using properties of the quantiles and expected tails 
of the normal distribution. 


Example 11.4 If Y ~ N(y,o7) then 
CVaRo.95(Y) = u + 2.060, CVaRo.99(Y ) = u + 2.670. 


When Y has a discrete distribution, CVaR can be computed by sorting the 
values of Y. 


11.2 


11.2 A Key Property of CVaR 185 


Example 11.5 Assume Y takes values yx, k = 1,...,S, in S possible scenarios: 
PY = yx) =pr, k= 1,...,8, 
where 
Yi S Y2 S<: Sys. 
Then 


S 
1 
CVaRa(Y) = —— iYi, 
aRalY) = 1 Dy piy 
i=K 
where K is the smallest index such that 


Ss 
S pi=1-a. 
i=K 


Note that here we may have to split the probability px. 


A Key Property of CVaR 


We next present a key property of CVaR that makes it possible to solve portfolio 
optimization problems with CVaR via convex optimization. 


Proposition 11.6 Assume Y is a loss function. Then for a € (0,1) 
1 


OVaRa(Y) = min Ç + z Elmax(Y — 7, 0) 


Furthermore, the optimal solution (i.e., the minimizer) ¥ of this problem is 


VaRa(Y). 


As consequence of Proposition 11.6, it follows that CVaR is subadditive. 
Indeed, CVaR is a coherent risk measure (see exercises at the end of the chapter 
for details). Another consequence of Proposition 11.6 is that CVaR can be 
computed as the following linear two-stage stochastic program: 


CVaR.(Y) = min ly+E(Q(7,Y))I, 


where 
1 
Y) := mi . 
QY) = min -z 
st. Z> Y —y 
z>0. 


More concisely 
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In this formulation the first-stage and second-stage decision variables are y and z 
respectively. Notice that z is adapted to the random outcome Y. In the particular 
case when Y is discrete and takes values yx, for k = 1,...,5, in S possible 
scenarios (not necessarily sorted): 


P(Y =e) = "Day k = Dyn 205 Ss 
we obtain the following linear programming formulation for CVaR.(Y). 
Variables: 
Y, 21) +++, 28. 


Linear programming formulation of CVaR: 


S 
. 1 
min y+ Toa LP 
s.t. Zk > Yk— 1, for k = 1,..., S 

Zk 2 0, for k = 1,..., 5S. 


An advantage of this formulation is that it allows us to minimize the 
CVaR of a portfolio via linear programming as we next explain. 


Portfolio Optimization with CVaR 


The discussion in this section is based on Andersson et al. (2001). This study 
uses CVaR for measuring and controlling the credit risk of a portfolio of bonds. 
The loss function of interest is the loss due to credit risk; that is, the loss that the 
portfolio may suffer due to default or credit migration in its positions. This type 
of loss function is characterized by having a large likelihood of no loss and a small 
likelihood of a substantial loss. The loss distribution is heavily skewed. In this 
case, standard mean-variance analysis to characterize market risk is inadequate. 
VaR and CVaR are more appropriate criteria for minimizing portfolio credit risk. 


Distribution of Future Values for One Single Bond 


Consider a risky bond and a fixed time horizon, e.g., one year. The future value of 
the bond depends on the forward curve that applies to its coupon payments. The 
forward curve in turn depends on the current rating of the bond. The benchmark 
future value of the bond is the future value of the bond if there is no change on 
its credit rating. However, in the event of credit migration, the future value of 
the bond may differ from the benchmark value. In particular, if the credit rating 
deteriorates, the coupon payments will be subject to higher discount values and 
the future value of the bond will be lower than its benchmark value. 

For a concrete illustration, suppose the one-year forward interest curves for 
the S&P credit ratings are as follows: 
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Category Yearl Year2 Year3 Year 4 


AAA 0.036 0.0417 0.0473 0.0512 
AA 0.0365 0.0422 0.0478 0.0517 
A 0.0372 0.0432 0.0493 0.0532 
BBB 0.041 0.0467 0.0525 0.0563 
BB 0.0555 0.0602 0.0678 0.0727 
B 0.0605 0.0702 0.0803 0.0852 
CCC 0.1505 0.1502 0.1403 0.1352 


Suppose the probabilities of credit rating migration for A, BBB, and B in one 
year are as follows: 


Rating at year end 
Initial rating AAA AA A BBB BB B CCC Default 


A 0.09% 2.27% 91.05% 5.52% 0.74% 0.26% 0.01% 0.06% 
BBB 0.02% 0.33% 5.95% 86.93% 5.30% 1.17% 0.12% 0.18% 
B 0.00% 0.11% 0.24% 0.43% 6.48% 83.47% 4.07% 5.20% 


Assuming a 50% recovery rate in default, the possible future values of a five- 
year, 6% BBB bond with face value 100 are as follows: 


Year-end rating Future value Probability 


AAA 109.352908 0.0002 
AA 109.1723709 0.0033 
A 108.6429921 0.0595 
BBB 107.5309439 0.8693 
BB 102.0063855 0.053 
B 98.08591318 0.0117 
CCC 83.6257912 0.0012 
Default 50 0.0018 


For example, for BBB rated bonds, the future value 107.5309439 was obtained 
as follows: 


1 1 1 1 1 
H 100 - —__.. 
as 1.041 4 1.04672 1.05253 5 r) 1 1.05634 


107.5309439 = 6 - (1 


Credit Risk Optimization for a Portfolio of Bonds 


Now suppose we construct a portfolio of risky bonds. Assume there are n risky 
bonds and let x; be the percentage of portfolio invested in bond j. Then the loss 
function of our portfolio is 
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n 
Y(x) := (b—w)'x = X (bj — w;)£j, 
j=l 
where each b; is the future bond value of bond j with no credit migration, and 
wj is the (random) possible future bond value of bond j with credit migration. 
Suppose we want to select the portfolio in the constraint set Y with minimum 
CVaR . In other words, we want to solve 


min CVaRa(Y (x)) 


xen. 
Suppose that the possible scenarios for the vector of future bond values w = 
T 
[w vee Wr | are 
z 
w" = [wf vee wr] hae Se 


Then by Proposition 11.6, this problem has the following formulation: 


1 S 
min y+—— PkŽk 
Y xf 1 —aQa 
k=1 
s.t. zk > (b—w*)'x—y, k=1,...,8 (11.2) 
Zk = 0, k= 1, ee 
xE 
y free. 


If the constraint set ¥ is defined by linear constraints, then (11.2) is a linear 
program. 


Scenario Generation in the Credit-Risk Example 


When there is a single bond, the probability distribution of the possible future 
values of the bond depends on the probability of credit migration and the bond 
value in each of these scenarios. For instance, for the S&P ratings, the scenarios 
correspond to the ratings AAA, AA, A, BBB, BB, B, CCC, and default. The 
likelihood of each of these scenarios is given by the migration matrix, which 
estimates the probability of migrating from one rating to the others over a 
specified time period. 

The discrete distribution readily yields the set of possible scenarios for the 
bond. Scenarios can also be generated via normal sampling as Figure 11.1 sug- 
gests (assuming we are working with a BB bond). 


More precisely, normal sampling goes as follows: 


e compute Zscores associated with the probabilities of each of the scenarios, 
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Firm remains 
BB rated 


Downgrade to B Upgrade to BBB 


> 


Firm defaults 


Def ccce ZB ZBB ZBBB ZA ZAA 


Asset return over one year 


Figure 11.1 


e draw samples from a standard normal distribution, 
e use the Zscores to determine the sampled scenario. 


Some interesting challenges arise in the scenario generation when we need 
to work with multiple bonds. Under the simple assumption that the credit 
migrations are statistically independent, we can generate scenarios via discrete 
sampling or independent normal sampling. Notice that, although discrete, the 
joint probability distribution for a set of ten or more bonds is extremely large. 
Hence it is generally impractical to exhaustively generate the entire set of sce- 
narios. 

In the case when credit migrations are correlated, the scenario generation 
problem becomes more interesting. In this case a possible solution is to use 
correlated normal sampling. That is, draw samples from a correlated joint mul- 
tivariate random variable. Then map each of the components in the random 
sample to a possible credit rating of the bonds. Some statistical packages, like 
the statistics toolbox in MATLAB, readily provide routines to sample from cor- 
related multivariate normal variables. However, it is easy to generate correlated 
normal sampling from independent normal sampling. More precisely, to sample 
from a general n-dimensional normal distribution N (u, V), proceed as follows: 


e Let LL! = V be the Cholesky factorization of the covariance matrix V. 
e Sample n standard independent normals x; ~ N(0, 1). 

e Put y = u + Lx. 

e The resulting variable y has the desired distribution y ~ N (m, V). 


Solution of a Real-World Bond Example 


Andersson et al. (2001) considered a portfolio of 197 bonds from 29 different 
countries with a market value of $8.8 billion and duration of approximately 


190 


11.4 


11.5 


Stochastic Programming Models: Risk Measures 


five years. Their goal was to rebalance the portfolio in order to minimize credit 
risk. The one-year portfolio credit loss was generated using a Monte Carlo 
simulation: 20,000 scenarios of joint credit states of obligators and related losses. 
The distribution of portfolio losses had a long fat tail, as expected. The authors 
rebalanced the portfolio by minimizing CVaR using formulation (11.2). For a = 
99%, the original bond portfolio had an expected portfolio return of 7.26%. The 
expected loss was 95 million dollars with a standard deviation of 232 million. The 
VaR was 1.03 billion dollars and the CVaR was 1.32 billion. After optimizing the 
portfolio (with expected return of 7.26%), the expected loss was only 5000 dollars, 
with a standard deviation of 152 million. The VaR was reduced to 210 million and 
the CVaR to 263 million dollars. So all around, the characteristics of the portfolio 
were much improved. Positions were reduced in bonds from Brazil, Russia, and 
Venezuela, whereas positions were increased in bonds from Thailand, Malaysia, 
and Chile. Positions in bonds from Colombia, Poland, and Mexico remained high 
and each accounted for about 5% of the optimized portfolio. 


Notes 


As early as the 1970s and 1980s, some major financial institutions developed 
internal systems for risk management. The best known of these systems was 
RiskMetrics developed in the late 1980s at J.P. Morgan when chairman Dennis 
Weatherstone requested his staff provide a “4:15pm” daily one-page report 
measuring and explaining the risks and potential losses over the next 24 hours 
across the bank’s entire portfolio. The RiskMetrics system featured and 
popularized the use of value at risk as a risk measure. The interest in a rigorous 
treatment of risk measures led a set of prominent scholars to develop a formal 
theory of coherent measures of risk in a landmark paper (Artzner et al., 1999). 
Conditional value at risk is one of the most popular coherent measures of risk. 


Exercises 


Exercise 11.1 Construct a counterexample to show that VaR is not necessarily 
subadditive. In other words, construct two loss functions Y, Z and a confidence 
level a € (0,1) so that 


VaRa(Y + Z) > VaRa(Y) + VaRa(Z). 


Exercise 11.2 Show that a coherent risk measure p satisfies: if X > Y then 
p(X) > p(Y). 

Hint: Use the monotonicity and subadditivity of coherent risk measures. 
Exercise 11.3 The purpose of this exercise is to prove Proposition 11.6 under 


some additional assumptions. Assume Y is a continuous loss function with den- 
sity f(y) and a € (0,1). Let g(y) be defined as 
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1 co 
NO) ae max(y — 7,0) f(y) dy 
(a) Show that 
1 Co 
1 
re an d 
g) eral f(y) dy 
(b) Let 7 be the minimizer of the optimization problem 
min g(7). 
5 
Use part (a) to show that 
VaRa(Y) = 7. 


(c) Let ¥ be as in part (b). Use parts (a) and (b) to show that 
CVaRa(Y) = g(7). 
Exercise 11.4 Use Proposition 11.6 to prove that CVaR is subadditive; that is, 
CVaRa(Y + W) < CVaRa (Y) + CVaRa (W) 
for any two loss functions Y, W and a € (0,1). 
Exercise 11.5 


(a) Suppose a loss function Y has normal distribution with mean yu and variance 
o°; that is, Y ~ N(u,0°). For a € (0,1) determine both VaRa(Y) and 
CVaRa(Y) in terms of the standard normal cumulative function 


S ful 2 
p(x) = — e / dt. 
(2) k V2T 
In particular, show that 


VaRo.95(Y) =p + 1.6450, CVaRo.95(Y) =e + 2.060. 


Suppose a loss function Y has lognormal distribution with log mean pu 
and log variance 07, that is, log(Y) ~ N(,07). For a € (0,1) determine 
both VaR,(Y) and CVaR,(Y) in terms of the standard normal cumulative 


function 
ar | 2 
w(x) = | —e™ ldt. 
V2r 


BEDEVA n 


© 
x 


Exercise 11.6 The Excel spreadsheet “Exercise 11.6 Twelve Portfolios” pro- 
vides scenarios (based on historical data) for the joint annual returns of 12 
industry portfolios in the US market. 


(a) Any given portfolio attains its “worst” return in some scenario. For example, 
a “NoDur” portfolio attained its worst return in 1931 whereas a “Hlth” 
portfolio attained its worst return in scenario 1929. 

Find a long-only portfolio that maximizes the worst possible return. 
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(b) Find the long-only portfolio that maximizes the expected return while ensur- 
ing that its worst possible return is no lower than 2% below what you found 
in part (a). 


Exercise 11.7 The Excel spreadsheet “Exercise 11.7 Three Bonds” provides a 
(hypothetical) discrete distribution for the future value of three different bonds. 
In each case, the first value is the “benchmark” future value in the case of 
no credit quality change. For simplicity assume the three bonds have the same 
current value, say $100. 

The spreadsheet also has the joint distribution for the future values of the 
three bonds (64 = 4 x 4 x 4 possible scenarios) assuming that the probability 
distributions are independent. 


(a) For a given bond portfolio x = [xı x2 z3], use the loss function dis- 
cussed in Section 11.3, 


3 
Y = (b — w)'x = >; — wj)£j, 


j=1 


where 6; is the benchmark future value of bond j, and w; is the random 
future value of bond j for j = 1,2,3. Determine the VaRo.95 and CVaRo.95 
values of the portfolio x = [0.4 0.1 0.5]. 


(b) Set up a CVaR optimization model to find a portfolio x = [xı x9 x3] 
with x; > 0, £1 + #2 +23 = 1, with the same benchmark future value as 
that of the portfolio [0.4 0.1 0.5]', and with minimum CVaRo.95. What 
is the VaRo.95 value of the optimal portfolio? What is the optimal portfolio? 


Exercise 11.8 The Excel spreadsheet “Exercise 11.8 Six Bonds” provides a 
(hypothetical) discrete distribution for the annual return of six different bonds. 

The return of each individual bond only has two possible values. The first 
value is the “benchmark” return in the case of no credit quality downgrade. The 
second value is the return in the case of credit downgrade. 

The Excel file also contains the joint distribution for the returns of the six 
bonds (64 = 2° possible scenarios) assuming that the probability distributions 
are independent. 

Suppose you currently have $100 and need to fulfill an obligation of $115 in 
a year. You intend to invest the $100 in the six bonds to try to meet the $115 
obligation. You realize that because of the bonds’ credit risks you may not be 
able to meet this financial goal in some scenarios. In such a case you will need 
extra money to cover the shortfall; that is, the difference between the obligation 
and whatever you can cover. For instance, if the bond portfolio in a year is worth 
$105 < $115, then the shortfall would be $10 = $115 — $105. On the other hand, 
if the bond portfolio in a year is $120 > $115, then the shortfall would be zero. 
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(a) Formulate a linear programming model to determine how much should be 
invested in each bond so that the expected value of the shortfall a year from 
now is minimized. Assume the portfolio must be long-only. 

Formulate your linear programming model in Excel and solve it. 

What is the composition of the optimal portfolio, that is, the amount of 
money in bond i, for i = 1,...,6? 

What is the expected value of the shortfall? 

What is the value of the worst (largest) possible shortfall? 

(b) Define the loss function of your bond portfolio as the value of the liability 
minus the value of your portfolio. 

(i) Compute the numerical value of CVaRo.95 of this loss function if your 
portfolio is equally divided among the six bonds. 

(ii) Find the long-only portfolio of these six bonds that minimizes CVaRo 95. 
What is the optimal CVaRo.95 value of this portfolio? 
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Multi-Period Models: Simple 
Examples 


The next few chapters will be devoted to multi-period models. Unlike the single- 
period models we have discussed so far, multi-period models incorporate the 
dynamic nature inherent when decisions are made at different stages. The deci- 
sions to be made at each particular stage can adapt to information collected 
in previous stages. In this chapter we discuss the following fundamental multi- 
period models: the Kelly criterion for repeated gambles, dynamic portfolio opti- 
mization with myopic strategies, and optimal scheduling of trades to control 
execution costs. The strong assumptions made in these models allow us to solve 
them with relatively simple techniques. Subsequent chapters will introduce more 
involved techniques for multi-period models, namely dynamic programming and 
stochastic programming. We will rely on these techniques to tackle more elaborate 
financial optimization models. 


The Kelly Criterion 


The Kelly criterion is a classical formula derived to maximize the average rate 
of growth of a gambler’s fortune in a sequence of bets published in the land- 
mark paper by Kelly (1956). The formula has appeal among some investment 
professionals. In particular, there are claims that many successful investors, 
including Edward Thorp, Warren Buffett, and Bill Gross, use Kelly-like methods. 
The popular book Fortune’s Formula (Poundstone, 2005) gives a non-technical 
and engaging description of the Kelly criterion and its role in gambling and 
investing. 
The Kelly formula can be explained as follows. Suppose a gambler can enter 
a bet with two possible outcomes: lose the entire amount bet or win the amount 
bet. Assume the probability of winning is p. Suppose a gambler starts with some 
initial wealth Wo and can take this gamble repeatedly. What fraction of her 
current wealth should she bet each time? 
To answer this question, let W,, be the gambler’s wealth after n gambles. The 
rate of growth of the gambler’s fortune is 
g= z log Wr, 
Wo 


n 
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Suppose the gambler bets a fraction f of her current wealth each time. Then 
Wr = (1+ f)*(1— f) "Wo, 


where k is the number of wins (out of the n gambles). 
Therefore we get 


k n—k 
g =~ log(1+ f) + = log(1 — f). 


Taking expectations, we get 


L(g) = plog(1 + f) + (1 — p) log(1 — f). 


This function of f attains its maximum at f = 2p — 1. That is, by betting the 
fraction 2p — 1 each time, the gambler maximizes the expected growth rate g. 

The above reasoning and formula can be extended to gambles where the payoff 
does not necessarily match the amount bet, and to gambles with a non-binary 
outcome. We can indeed see the Kelly criterion as a special case of the dynamic 
portfolio optimization model discussed next. 


Dynamic Portfolio Optimization 


Our first dynamic portfolio optimization model concerns the expected utility 
of final wealth, assuming the portfolio can be rebalanced at intermediate steps. 
More specifically, suppose that an investor starts at t = 0 with an initial endow- 
ment Wọ. At times t = 0,...,T — 1 the investor invests her wealth W, in a 
portfolio of one risk-free asset and any number of risky assets. The investor’s 
goal is to maximize the utility of terminal wealth U (Wr) at time T for a suitable 
utility function U(W). We next describe a formal model for this dynamic port- 
folio optimization problem. To that end, we introduce the following convenient 
notation: 


e R41 = gross random returns of the assets in period [t,t + 1], 


Ry++1 = gross risk-free return in period [t,t + 1], 


Rpt+1 = gross random return of the investor’s portfolio in period [t,t + 1]. 
e Decision variables: x, = holdings (percentages) in the risky assets at time t. 


e Inter-temporal constraints (also known as law of motion): 
Wir = Wi: Rp,t+1 


— W: pi (Rft+1 + (Revi a Ry e411)" xe) ; t = 0, see ye psa 1. 


e Objective: max EJU (Wr)]. 
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Optimality of Myopic Policies 


The solution of a multi-period model typically requires dynamic or stochastic 
programming techniques that we will cover in later chapters. However, under 
suitable assumptions the above model is sufficiently simple that we can solve it 
directly. Observe that the final accumulated wealth at stage T is 

T 

Wr = Wo- | | Row. 

t=1 
We will primarily consider the class of constant relative risk-aversion utilities 
given by the power utility U(W) = W!~7/(1 — y) with risk aversion y > 0, y 4 1 
and the logarithmic utility U(W) = log(W). Observe that log(W) = lim,_,1 
Weevil), 

For a power utility U(W) = W1!~7/(1 — y) we get 


T 
(1-7) U(Wr)= Wr” =W”. || Rv. 
t=1 


For logarithmic utility U(W) = log(W), we get 
T 
U(Wr) = log(Wo) + X` log(Rp.t). 
t=1 
From these expressions for the utility of final wealth, we can readily reach the 
following conclusions: 


e If the risk-free return Rf = Ry is the same for t = 1,...,T and the risky 
returns R; are independent for t = 1,...,T, then each x; is the solution to 
a single-period problem. In other words, a myopic policy is optimal. 

e If the risk-free return Rr; = Ry is the same for t = 1,...,7 and the 
risky returns R, are independent and identically distributed (i.i.d.) for 
t=1,...,7, then all x; are the same. 

e For U(W) = log(W), a myopic policy is optimal regardless of the distribution 
of the returns Ri, Rf, for t=1,...,7. 


The above conclusions can be related to the following two classical puzzles of 
finance (Kritzman, 2002): 


e Half stocks all the time or all stocks half the time? 
e Time diversification: Is it true that lengthening the investment horizon reduces 
risk? 


The first puzzle can be more precisely stated as follows. Suppose there is a 
risky asset “stocks” with expected return u and standard deviation ø, and a 
risk-free asset with return r < u. Consider the following two possible dynamic 
investment strategies: 


e balanced strategy (50%, 50%) in every period; 
e switching strategy (100%, 0%) half of the time and (0%, 100%) the other half. 
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Suppose the risk-free return is constant, the stock returns are i.i.d. across time, 
and an investor has a power utility. Which of the two strategies is preferable? 

The second puzzle can be more precisely stated as follows. Suppose there is 
a risky asset “stocks” with expected return u and standard deviation a, and a 
risk-free asset with return r < u. Suppose the risk-free return is constant, the 
stock returns are i.i.d. across time, and an investor has a power utility with risk 
aversion y > 0. Is it true that if the investor’s investment horizon is T > 1 then 
she should initially hold a higher percentage of her portfolio in stocks than if her 
horizon is T = 1? In his book, Kritzman (2002) expertly discusses the answers 
to these puzzles as well as a few others. 


An Example Where a Myopic Policy Is Not Optimal 


It is important to understand that the above conclusion concerning the opti- 
mality of myopic policies relies on strong assumptions on the asset returns, the 
investor’s utility, and the fact that the investor only receives an initial endowment 
at time 0 and maximizes her utility of wealth at the final time T. The following 
simple example illustrates a case when a myopic policy is not optimal. 


e Consider a three-stage (two-period) problem, i.e., T = 2. 

e At t = 0 we can invest in a one-period bond or a two-period zero-coupon 
bond. 

e At t=0 we know that the risk-free interest rate is Rey = 1.1. 

e At t=0 we know that at t = 1 the risk-free interest rate will be as follows: 


_ f 1.12 with prob 1/2, 
$2 — 1.08 with prob 1/2. 


e The one-period bond is a contract that can be entered at t = 0 and delivers 
$1.1 at time t = 1 for each dollar invested. 

e The two-period bond is a contract that can be entered at t = 0 and delivers 
$1.2096 at time t = 2 for each dollar invested. The value of the two-period 
bond at time t = 1 depends on the risk-free interest rate at that time. Its 
value V at t = 1 per dollar invested is the 1.2096 final payout discounted 
at the applicable rate: 


1.2 
FA Vos wieheare L2. 
ee eee 
54 12 
~ = 1,12 with prob 1/2. 


e A myopic investor is one with investment horizon T = 1; a long-term investor 
is one with investment horizon T = 2. 
e What would a risk-averse myopic investor do at time 0? 


e What would a risk-averse long-term investor do at time 0? 
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Execution Costs 


The efficient management of trading costs is a challenge to all institutional 
investors. These costs are associated with commissions, bid—ask spreads, oppor- 
tunity costs of waiting, and the price impact of trading. These types of costs 
generally have a substantial impact on investment performance. For instance, 
a classical study of Pérold (1988) shows that a hypothetical “paper” portfolio 
constructed according to Value Line rankings outperforms the market by almost 
20% during the period from 1965 to 1986. However, the actual portfolio — the 
Value Line Fund — outperformed the market by only 2.5% per year. The dif- 
ference between these figures arises from execution costs. This “implementation 
shortfall” is surprisingly large and highlights the importance of execution-cost 
control, particularly for institutional investors whose trades often constitute a 
large fraction of the average trading volume of many stocks. A common and 
intuitive practice is to spread large execution orders over a period of time, e.g., 
a few hours or days. This scheduling of trades aims to find a balance between 
two conflicting objectives: on the one hand, fast execution generates large market 
impact and consequently generates large costs. These costs are related to liquidity 
as well as leakage of information. On the other hand, delayed execution reduces 
market impact but comes at the expense of greater uncertainty and opportunity 
risk. Suitable models of price dynamics and market impact lead to insightful 
conclusions on the tradeoff faced between these extremes. 

One of the first formal models for trade execution was proposed by Bertsimas 
and Lo (1998) and is based on dynamic programming techniques. As we will 
see, their model shows that, under suitable conditions, the naive strategy of 
dividing a large order equally across the trading period minimizes expected 
trading cost. However, the model concentrates on expected cost only and does 
not take into consideration the risk (variance) of trading costs. This led Almgren 
and Chriss (2000) to propose a model that finds an optimal tradeoff between 
expected cost and risk. We next discuss the Almgren—Chriss trade execution 
model in detail. In its original form, this model can be presented in a rela- 
tively simple conceptual framework. Furthermore, the Almgren—Chriss model 
underlies several more elaborate execution models developed over the last few 
years. 


Almgren—Chriss Trade Execution Model 


We first introduce formal definitions associated with a trading strategy and price 
dynamics for the execution of a sell program for a single security. The definitions 
and model for a buy program are similar. 

Assume we hold a block of X units of a security that needs to be liquidated 
by time T. Divide the time interval [0,7] into N intervals of length 7 := T/N, 
and define the discrete times tp = kr, for k = 0,1,...,.N. 
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Define a trading trajectory as a vector 
= 
x= [£o a. eS ry] . 


Here x, = the number of units that we plan to hold at time tg. We have boundary 
conditions associated with our initial holding, i.e., zo = X, and liquidation at 
time T, i.e., ty = 0. 

The trading trajectory implies a trade list 


y= = yN], 


where yp = £k—-1 — £k. Each yp is the number of units sold in the time interval 
[tk-1, tk]. 

An execution trading strategy is a rule for determining the trade size y, given 
the information available at time t,_1. 

We also need a model for the price dynamics of the security. Assume the 
initial security price (at time 0) is So. The security price evolves according to 
two exogenous factors, volatility and drift, and one endogenous factor, market 
impact. Volatility and drift are assumed to be the result of market forces that 
occur independently of our trading. On the other hand, as market participants 
begin to detect the volume we are selling, they naturally adjust their bids down- 
ward. We distinguish two types of market impact: temporary and permanent. 
Temporary impact is the change in price in a single time interval due to the 
imbalance between supply and demand occurring as a result of our trading. 
Permanent impact is the equilibrium change in price due to our trading that 
lasts for the entire life of our liquidation. 

Assume the security price evolves according to a discrete arithmetic random 
walk in addition to a term that accounts for permanent impact. The security 
price at time tẹ is given by 

Sk = Sk—1 tore, —Tg (*) 
for k = 1,..., N. Here o represents the asset volatility, éx ~ N(0,1), and the 
permanent impact g(v) depends on the average rate of trading v = y,/7 during 
the interval [t,—1, tx]. 

We next incorporate the temporary market impact. The intuition is that a 
trader that liquidates y, units during the interval [tk—1, tk] may see the price 
decrease as a result of limited liquidity. We assume that this effect is short-lived 
and in particular liquidity returns after each time interval. To model this impact, 
we incorporate a price impact function h(v) that affects the actual price per share 
received for trade yj: 

Sr = -1 -h (#). 
T 
However, the effect of h(v) does not appear in the next market price Sx. 

Given the above trading model, we can compute the execution cost resulting 

from trading along a certain trajectory. The captured value of a trading trajectory 
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is the total revenue obtained after liquidation. Some straightforward calculations 
show that this equals 


N 


SO Špur = SoX >? (ore ra ( #)) Ti Sanh (=) 21) 


k=1 


The total cost of trading or implementation shortfall is the difference between 
the initial book value of the position and the captured value: 


N N N 
C(x) = X — X` Skye =X vrh (=) -5 (ore, aay (*)) Lp. 
k=1 k=l 


Consequently, given the above price dynamics, it follows that the expected 
shortfall E(x) and variance of shortfall V(x) are respectively 


(x) = E(C(x)) = 3 (ven (Ht a 


k=1 


and 
N 
x)=0° DD Tr. 
k=1 


The units of E(x) are dollars, and the units of V(x) are dollars squared. 
For simplicity of exposition, we will make the following assumptions: 


e the temporary impact function is linear h(v) = nv; 
e there is no permanent impact; 


e T=1. 


It is possible to extend the model and results when these assumptions are relaxed. 
Under the above assumptions, the expected shortfall is 


N N 
x) = So ueh(ye) = 0 ue 
k=1 k=1 


Consider the problem of finding the trading trajectory that minimizes expected 
shortfall: 


N 
. 2 
min 7 > Yk 
x k=1 
N 


Sit: Xor = X. 


k=1 


It is easy to see that the solution to this problem is the equally divided trade list 
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This corresponds to the linear trajectory 
_N-k 


Tk = 


SLINY. 

N 
This linear trajectory has a natural connection to the so-called volume-weighted 
average price (VWAP) strategy. The VWAP over the trading period [0, T] is 
defined as 


N 
VWAP := “= =Y Sk 
5y Vy k=1 
k=1 


Here Vp stands for the volume traded during the kth time interval [tk—1, tg] and 
ux stands for the percentage of daily volume traded during the same interval. 

The VWAP strategy trades in proportion to the traded volume during an 
interval, i.e., it is the following strategy: 


Yk := Uk X. 


It is easy to see that the VWAP strategy minimizes expected shortfall when the 
temporary impact function is linear in the fraction of total volume traded, i.e., 
when h(yx) = n + (yr /Vr). 

Observe that the linear trajectory 

N-k 
tk = X, k=1,...,N 

has expected shortfall 
: xX? 
(x) =n N 


and variance 

(N — 1)(2N — 1) 
6N 

Consider the extreme urgency strategy that liquidates the entire position during 

the first period: 


o? X?. 


V(x) = 


Yy =X, y=- =yn =l, tı =- = zy =Q. 


This trajectory has variance zero and expected shortfall nX?. 


Efficient Frontier of Optimal Execution 


The two execution strategies above suggest that we consider a tradeoff between 
the two objectives E(x) and V(x). In analogy to Markowitz’s mean-variance 
framework, Almgren and Chriss (2000) define an execution strategy to be effi- 
cient if no other strategy has both lower expected shortfall and lower variance. 
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Just like in the mean-variance context, there are several equivalent formulations 
for efficient execution strategies. A computationally convenient formulation is: 


min E(x) + AV (x) 


x 


oer. (12.2) 
TN = 0 
for some risk-aversion parameter \ > 0. For convenience, put U(x) := E(x) + 


AV (x). We can think of U as a “disutility” function. Using the expressions for 
expected value and variance of shortfall, we obtain 


N N 
U(x) = nX (zk = rka) + do? 5 x. 
k=1 k=1 


In particular, U(x) = E(x) + AV(x) is a convex quadratic function. The opti- 
mality conditions for (12.2) are 
OU 


z = 2(A0? + 2N)£k — 2N(£k-1 + £k+1) = 0, 
Tk 
for k = 1,2,..., N — 1, together with the boundary conditions 
% =X, tn =0. 


The latter system of equations can be written as 


24+ A0?/n —1 0 vee 0 
— 2 — eee nat x 
1 2+ r07/n —1 0 E 0 
0 vee —1 2+ 07/n —1 ‘ 0 
0 TE 0 =i 24 Ao? Jn] NT 


and zo = X, zy = 0. 
For N large, the discussion below shows that the solution to this system of 
equations is approximately 
sinh(k(N — j)) 


= X }=0,1,...,N 
Tj sinh(KN) ’ J eae) ’ 3 


where « is the urgency parameter, 


Efficient Frontier in Continuous Time 


We can extend the results from the previous section to the continuous-time 
setting. The mathematics is a bit cleaner and more elegant. This is an idealized 
model where we assume that the trade can be executed continuously over the 
time interval [0, T]. 


Multi-Period Models: Simple Examples 


In this case we need to determine a continuous trading trajectory x(t), t € 
[0,7], with boundary conditions (0) = X, x(T’) = 0. We extend the security 
price dynamics to the continuous-time setting. Again, for simplicity we shall 
assume that there is only temporary impact. The price dynamics for the market 
price is an arithmetic Brownian motion 


S(t) = S(0) + o B(t), 
and the actual execution price received at time t is 
S(t) = S(t) — nyult), 


where y(t) := —a(t) is the rate of execution at time t. 
From properties of the stochastic integral, we get the following expression for 
the execution shortfall: 


Ts T T 
C(x) = X8(0) - i &(t)y(t)dt = f ipag f 2(t)dB(t). 


Therefore the expected shortfall E(x) and variance of shortfall V(x) are as 
follows: 


T sä 
(x) =n f ilt) dt, V(x) => | x(t)" dt. 


An efficient execution trajectory is the solution to the following problem: 


T 
min J (ni(t)? + Ao? x(t)?)dt 
x(t 0 

s.t. «&(0)=X 


«(T) =0. 


This is a problem in calculus of variations. The Euler equation for this prob- 
lem (see formula (A.1) in Section A.3) yields the following ordinary differential 
equation: 


with boundary conditions 
x(0)= X, «(T)=0. 
The solution to this differential equation is 
sinh(«(T' — t)) 
t) = ———_—_— -x T 
where « is an “urgency” parameter, defined by 


Ao? 
k := 4| —. 
n 
The parameter « has the following nice interpretation. The reciprocal 6 := 1/k is 


measured in units of time and can be interpreted as the “half-life” of the trade. 
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More precisely, when T —> ov, the trade is reduced by a factor of e = 2.71828... 
by time 0. 

It is insightful to verify the units of the various parameters of our model. The 
units of 07,7, and À are as follows: 


2. currency? 
ol :— 
volume? - time 
currency - time 


volume? 


= 1 
“currency” 
Recall that the urgency parameter is 
Ao? 


K = E 


n 


Therefore, the units of 0 = 1/« are indeed units of time. 


Multiple-Security Portfolios 


The previous execution model and results can be extended to the case when 
we need to liquidate a whole portfolio X = [X e Aa of m securities. 
In this case, the trading trajectory is a sequence of m-dimensional vectors x, = 
[1k tee £mk] , for k = 0,...,N. The trade list is also a sequence of m- 
dimensional vectors Yk = Xk—1 — Xk, k = 1,..., N. 

For simplicity we shall assume that there is only a linear temporary impact 
and 7 = 1. Hence the security prices Sẹ follow a multi-dimensional random walk: 


Sk = Sp-1 + Êk- 


Here €, ~ N(0, £), where X is the covariance matrix of the m security prices. 
We assume © to be symmetric and positive definite. 
The prices actually received are 


Sk = Sk — Hyp, 


where H is symmetric and positive semidefinite. 
Proceeding as before, we get the following expressions for expected shortfall 
and variance of shortfall respectively: 


N N 


E(x) = X yp Hyr = X (Xn — x-1) H (Xk — Xk-1) 


and 
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Now the set of efficient trading strategies is characterized by the solutions to the 
quadratic program: 


min E(x) + AV(x) 


x 


s.t. xo = X 
XN = 0. 


This is again a convex quadratic optimization problem. Its solution is 
-1 


IF +E -H 0 oe 0 

oe -H 2H+A8 -H as 0 es 
X2 . i 0 
y 0 te -H 2H+)\S -H ; 
va 0 pie 0 -H OOS 


and xp = X, xy =O. 

Unlike for the one-security model, the above solution may not necessarily 
satisfy the monotonicity constraints x, < xz_1, with k = 1,...,N. This means 
that the above strategy may have trades that are “buys” at intermediate steps, 
even though the execution is meant to liquidate a vector of positions. If this 
possibility is not desirable, we can introduce the constraints x, < Xķ—1, for k = 
1,..., N, into the optimization problem for efficient trajectories. The resulting 
model no longer has a closed-form solution but it is still a quadratic program. 
For particularly large portfolios, the size of the quadratic programs poses an 
interesting computational challenge. 


Adaptive Strategies 


The models described above in discrete and continuous time assume static tra- 
jectories. That is, the trajectories do not respond to changes during execution. It 
is conceivable that an adaptive strategy that depends on the initial portion of the 
trajectory could do better. In order to solve these kinds of optimization problems 
we need to rely on dynamic programming techniques. These are generally more 
challenging optimization problems. The next chapter introduces this powerful 
technique. 


Trade Execution Models in Practice 


A trade execution model used by an institutional investor typically includes 
other bells and whistles that we have not discussed, such as short-term alpha, 
spread, permanent impact, and temporary impact. These additional features 
can be incorporated in the model discussed in Section 12.3.1. An important 
empirical observation is that, instead of a linear market impact, other forms 
of market impact such as 1/2 or 3/5 powers of volume traded appear to be 
more appropriate. In these cases the optimal execution problem is no longer a 
quadratic program but it is still a convex program. 


12.4 
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The estimation of market impact is a challenging practical problem. One 
difficulty is the need for data at the execution level. Furthermore, even when 
such data are available, market impact is not directly observable. Instead, we can 
only observe the total realized impact, which includes permanent and temporary 
impact as well as some random noise. Almgren et al. (2005) estimate the market 
impact using linear regression, based on a large dataset of US equity brokerage 
executions from Citigroup. The model is used to calibrate a version of the 
Almgren—Chriss model with nonlinear temporary costs and is validated with 
out-of-sample backtesting. 


Exercises 


Exercise 12.1 Show that the Kelly criterion can be seen as a special case of 
the dynamic portfolio optimization problem with the logarithmic utility of the 
final wealth and a peculiar pair of risk-free and risky assets. Conclude that if 
the payoff for each dollar invested is b instead of 1, then the optimal fraction to 
bet is 

(b+ 1)p-1 

a 
Exercise 12.2 Consider the following variation of the Kelly criterion. Suppose 
at each betting round the gambler can bet on two independent gambles. For each 
of the two gambles the following applies: if a gambler bets one dollar, then she 
wins one dollar with probability p and loses the dollar she bet with probability 
1l—p. 

Suppose the gambler starts with some initial wealth Wp and repeatedly bets on 
the above two gambles. Determine the fractions f1, f2 of wealth that she should 
bet at each round in each of the two gambles to maximize the average growth 
rate of her wealth. 


Exercise 12.3 Recall the example described in Section 12.2.2. You may choose 
to prove the statements below formally or to verify them numerically. For the 
latter, you may use the Excel spreadsheet “Exercise 12.3 Two Periods”. 


(a) Suppose a myopic investor is risk-neutral; that is, her objective is to maxi- 


mize E(W,) by investing her initial wealth in a long-only portfolio composed 
of the one-period and the two-period bonds. Show that the investor would 
be indifferent between the one- and the two-period bonds. In other words, 
any long-only portfolio is optimal for the investor. 

(b) Suppose a myopic investor has power utility with risk-aversion parameter 


y > 0; that is, her objective is to maximize E (wi Y/(1— 7). Show that 


the investor would prefer to hold her entire portfolio in the one-period bond. 


(c) Suppose a long-term investor is risk-neutral; that is, her objective is to 
maximize the expected wealth E(W 2) at time 2. Show that the investor 
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would choose to place her entire portfolio in the one-period bond at t = 0 
and roll it over at the risk-free rate at t = 1. 

(d) Suppose a long-term investor has power utility with risk-aversion parameter 
y > 1; that is, her objective is to maximize E (w3 “(1 — 7). Show that 
the investor would prefer to hold part of her entire portfolio in the two-period 
bond. Furthermore, show that the higher the risk aversion y, the higher the 
holding in the two-period bond. 


Exercise 12.4 Prove identity (12.1) in the Almgren—Chriss model. 


Exercise 12.5 Consider the following variation of the one-asset trading model 
of Almgren and Chriss. Assume the security price at period k is 


Sk = Sp-1 + Êk 
and the actual security price received at period k is 
Sk = Sp_1 — Pe (Ye), 


where hk(Yk) = CYk/Vk, c is a constant, and vg is the volume traded during the 
kth interval [k — 1, k]. 


(a) Write down the expression for the shortfall, as a function of the trading 
trajectory [zo xı + zn] and/or the trading list [y1 +- TE 

(b) Write down the formulation for the problem of finding the trading list 
[yi vee yn] that minimizes expected shortfall. 

(c) Prove that the solution to the problem in part (b) is the VWAP strategy 


Exercise 12.6 Suppose today (year t = 0) you have an initial endowment 
Wo = $10,000. Now (beginning of year 0), and at the beginning of the next 19 
years, you can allocate your wealth to two investment choices: 


1. A risk-free asset “cash” that generates a 5% annual return. 
2. A risky asset “stocks” with annual return that is normally distributed with 
mean 10% and standard deviation 20%. 


Let W209 denote the endowment at the end of the 20-year investment period 
(i.e., beginning of year 20). 
Consider the following three investment strategies: 


1. Buy and hold: Invest the initial endowment 50% in cash and 50% in stocks 
and never rebalance. 

2. Balanced: Rebalance the portfolio at the beginning of every year to a 50% 
cash and 50% stocks mix. 

3. Switching: Alternate each year between 100% cash and 100% stocks. 
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The Excel spreadsheet “Exercise 12.6 Twenty Years” contains a random sam- 
ple of the risk-free and risky returns over a 20-year period. 


(a) 


(b) 


(e) 


Compute the total accumulated return achieved by each of the three strate- 
gies. That is, 


Compute the annualized return achieved by each of the three strategies. 


That is, 
Woo\? /20 m 
Wo l 


Use your favorite simulation software to generate 10,000 random samples of 
the risk-free and risky returns over a 20-year period. Report the sample mean 
and sample variance of the total accumulated return and of the annualized 
return for each of the three strategies. 

Produce charts (e.g., histograms) to visualize the distribution of total accu- 
mulated and annualized returns achieved by the three strategies. Which 
strategy seems to have a higher expected annualized return? Which one 
seems to have a higher variance of annualized return? 

Which strategy would you prefer? Why? 
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and Algorithms 


Dynamic programming is an approach to model and solve multi-period decision 
problems. The fundamental principle of dynamic programming is the Bellman 
equation, a certain kind of optimality condition. As we detail in this chapter, the 
central idea of the Bellman equation is to break down a multi-stage problem into 
multiple two-stage problems. Under suitable conditions, the Bellman equation 
yields a recursion that helps in characterizing the solution and in computing it. 
Before embarking on a formal description, we illustrate the dynamic program- 
ming approach via some examples. 


Some Examples 


Example 13.1 (Matches puzzle) Suppose there are 30 matches on a table and 
I play the following game with a clever opponent: I begin by picking up 1, 2, 
or 3 matches. Then my opponent must pick 1, 2, or 3 matches. We continue 
alternating until the last match is picked up. The player who picks up the last 
match loses. How can I (the first player) be sure of winning? 


Solution. If I can ensure that it will be my opponent’s turn when 1 match remains, 
I certainly win. Let us work backwards one step: If I can ensure that it will be 
my opponent’s turn when 5 matches remain, I will also win. The reason for this 
is that no matter what he does when there are 5 matches left, I can make sure 
that when he has his next turn, only 1 match will remain. Hence it is clear 
that I win if I can force my opponent to play when 5 matches remain. We can 
continue working backwards and conclude that I will ensure victory if I can force 
my opponent to play when 5, 9, 13, 17, 21, 25, or 29 matches remain. Since 
the game starts with 30 matches on the table, I can ensure victory by picking 
1 match at the beginning, bringing the number down to 29. 


Example 13.2 (Knapsack problem) Given a set of items, each with a certain 
weight and value, select the collection of items with total maximum value such 
that their total weight does not exceed some fixed weight limit W. 


Solution. Let w; > 0 and v; > 0 be the weight and value respectively of item t for 
t= 1,...,n. The knapsack problem can be formulated as an integer program and 
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solved via the technique covered in Chapter 8.2. We next illustrate an alternative 
approach via dynamic programming. Consider the problem as a sequence of 
binary decisions 2; € {0,1} corresponding to “include” or “do not include” 
item t for t = 1,...,n. To find the optimal selection of items, we can work 
“backwards” as we did in the matches puzzle. Let W; be the remaining amount 
of weight available at stage t = 1,...,n with Wı = W, and let J,(W;) denote 
the value of an optimal collection of items if we started selecting items at stage t 
with remaining weight limit W;. The value function J;(W;) satisfies the following 
backward recursion for t = 1,2,...,n—1: 


Jy (Wi) if w> W 


el { max{Ji41(W;), Jepi (Wi — we) +e} if w < Wi. as) 


Our goal is to obtain the value J;(W) and the corresponding optimal collection 
of items. The steps that lead to Jı(W) in the above recursion are tied to the 
optimal decisions x2; for t = 1,...,n—1. On the one hand, x} = 0 corresponds 
to (Wi) = Ji4i(W;); that is, do not select item t. On the other hand, x; = 1 
corresponds to Ji(W;) = Jt+1 (W: — wy) + vi. Observe that for 0 < Wp < W the 
last-stage value function satisfies 


0 if wn > War 
Un if Wn < Wp. 


Example 13.3 (Optimal consumption problem) Assume that now (beginning 
of year 0) you have an initial amount of wealth Wo > 0. At the beginning of 
year t you choose to consume C, dollars and invest the rest of your wealth in 
one-year treasury bills. You can consume at most the wealth available in year t. 
Consuming C; in year t provides a utility U(C;). On the other hand, each dollar 
invested in one-year treasury bill yields 1 +r dollars cash at the beginning of the 
next year. Suppose you want to maximize your total utility of consumption over 
the next T years: 


How much should you consume each year? 


Solution. The key to solving the optimal consumption problem is again to work 
“backwards” in time just like we did in the matches puzzle and knapsack problem. 
Let W, denote the amount of wealth available at the beginning of year t and let 
Ji(W,) be the total utility of consumption from year t to year T if we start at year 
t with wealth W;. The value function J,(W;) satisfies the following backwards 
recursion for t = 0, 1,2,..., T — 1: 


Ji(Wi) = pom Ms = Ci): (1+ r)) +U(Ci)} (13.2) 


and the maximizer C% is the optimal consumption level at year t. Observe that 
for Wr > 0 the last-stage value function satisfies 


Jr(Wr) =U(Wr) 
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attained at the optimal consumption level C% = Wr. 


Model of a Sequential System (Deterministic Case) 


We next introduce the formal notation and terminology of dynamic program- 
ming. The presentation follows the approach popularized in the classical book of 
Bertsekas (2005). For ease of exposition, we first consider the deterministic case. 
That is, the context without random components. 

A sequential system is defined by the following elements. 


Stages: These are the points in time when decisions are made. We will normally 
consider t= 0,1,...,7 ort =1,2,...,T. 

States: The state of the system at a particular stage is the information that is 
relevant for subsequent decisions. We will generally denote the state at 
stage t as s;, for t = 0,1,...,7. Sometimes it is convenient to include 
also a “final state” s7+1. 

Decisions: These are also called controls or actions that we can make at each 
stage and that affect the behavior of the system. We will generally denote 
the decisions as x;, for t = 0,1,...,7. 

Law of motion: This defines how the state of the system evolves. A general 
law of motion has the form 


St+1 = fi (St, Xt), t=0,1,...,7. 


Assume we are interested in optimizing some overall objective function 


T 
XO gelse x) + gr+1(ST+1), (13.3) 
t=0 
where each g:(s;, x+), for t = 0,1,...,T, and gr+1(Sr+1) is some cost or reward 
per stage. This defines a sequential decision problem: find x+, for t = 0,1,...,T, 


to minimize the total cost or maximize the reward (13.3). 
Both Examples 13.2 and 13.3 can be readily stated in this framework. 


Dynamic programming formulation for the knapsack problem 

Stages: t = 1,2,...,n. 

State at stage t: remaining weight capacity Wz. 

Decision at stage t: binary variable x; € {0,1} indicating whether to include 
item t or not. This decision is constrained to be x; = 0 if w; > W; as in 
this case the weight of item t exceeds the remaining weight capacity. 

Law of motion: the remaining weight capacity at stage t+ 1 is the one from 
stage t reduced by w if item t is included. Otherwise they are the same. 
More precisely, 


Wir = Wi — wex, t= 1,2,... n — 1. 
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Objective: maximize the total value of the selected items 


Dynamic programming formulation for the optimal consumption problem 

Stages: t= 0,1,2,...,7. 

State at stage t: available wealth W;. It is also convenient to assume that 
terminal wealth Wr41 = 0. 

Decision at stage t: consumption C; € [0, W;]. 

Law of motion: the wealth at stage t + 1 is the portion of wealth from stage t 
that was not consumed increased by a factor 1+ r. More precisely, 


W1 = (Wi - C) +r), €=0,1,2,...,T7. 


Objective: maximize the total utility of consumption 


Bellman’s Principle of Optimality 


The heart of dynamic programming is a principle of optimality due to Bellman. 
Its flavor was suggested by the solutions to Examples 13.1, 13.2, and 13.3. 
To state the principle precisely, we need a bit of notation. Suppose we are 
maximizing total reward 


T 
J(so) := Max [Y nenz + maler z 


X0; XT 
t=0 


Consider the “tail problem” that starts at stage t: 


Xt; XT 


T 

Ji(s+) := max paas torser) . 
T=t 

Bellman’s optimality principle can be stated as follows. The value-to-go functions 

Ji(s+) satisfy the recursive relationship 


h(s) = max {9e(St, Xt) + Jepi(fe(St, Xt))} - (13.4) 


The recursive relationship (13.4) is called the Bellman equation. Observe that 
the recursive relationships (13.1) and (13.2) are exactly the Bellman equation 
(13.4) in the particular context of Examples 13.2 and 13.3 respectively. 

There is a certain jargon associated with the solution to a sequential decision 
problem and Bellman’s optimality principle. The function J;(s;) is called the 
value-to-go function at stage t. If the objective is to minimize a total cost, 
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sometimes it is called the cost-to-go function. The solution xž (s+) of Bellman’s 
equation (13.4) at stage t is called an optimal decision rule at stage t. Notice that 
this solution depends on the state s; at stage t. The vector of optimal decision 
rules (x§(-),...,x*(-)) is called the optimal policy. 


Bellman’s optimality principle can be phrased as: 


If (x§(-),---,x7(-)) is an optimal policy for the entire problem, then 
(xf(-),...,xp(-)) is an optimal policy for the tail problem beginning at 


Linear—Quadratic Regulator 


We next illustrate Bellman’s optimality principle with a popular model from 
control engineering called the linear—quadratic regulator. It provides the founda- 
tion for a model of dynamic investment with transaction costs and predictable 
returns that we will discuss in the next chapter. The linear—quadratic regulator 
is a model for the problem of steering the location s; of an object towards the 
origin via a control input uz. Instead of a constraint on the location of the object, 
the linear—quadratic regulator imposes a penalty for deviating from the origin. 
Assume the states and controls evolve according to the following linear law of 
motion: 


Si41 = As; + Bw, t=0,1,...,N—-1. 


Assume we have a quadratic cost function 


N= 
(sT Qs; + ul Ru;) + sv Qsn, 
0 


= 


t= 
where Q, R are symmetric positive definite matrices of appropriate sizes. 


The goal is to determine the optimal sequence of controls u,,t = 0,1,..., N—1, 
that minimize the above cost when the initial position of the object is so: 


N-1 
J(so):= min > (sT Qs; + ul Ru;) + Jass} y 


uo,- UN—1 1=0 


We next apply the backwards dynamic programming principle. For the last stage 
N we evidently have 


Jn(sy) = svQsn. 
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For stage N — 1 we have the Bellman equation 


JIn—1(Sn-1) = miin {sy_1Qsn-1 + uy_,;Ruy-1 + Jn(sy)} 


: T T 
= min {sy_1Qsn-1 +uy_,Ruy-1 
N-1 


+ (Asy_1 + Buy_1)'Q(Asy_1 + Buy-)} 


2 min {sy_,Qsy-1 +8y_,A'QAsy_; + 2sy_,A'QBuy_; 


+uy_1(R+B'QB)uy-1}. 


The latter is a convex quadratic function of uy_;. To find its minimum, we 
compute its gradient and equate it to zero to obtain: 


2B'QAsy_1 + 2(R+ B'QB)uy_; = 0. 
Thus, the optimal control at stage N — 1 is 
uy_, = —(R + B'QB)'B'QAsy-_; = Ly-isy-1, 
where 
Ly_-1=—(R+B'QB)'B'QA. 
Plugging this value of u3;_, in the above expression for Jy_—1(Sn_—1) we get 


Jn—1(Sw-1) =Sy_1Qsn-1+8y_,A'QAsn-1 
—sy_,A'QB(R + B'QB)"'B'QAsy_1 


= sy_1Kn-18w-1, 
where 
Ky-1 =Q+A™(Q- QB(R + B'QB)'B'Q)A. 
Next we will prove by induction that 
J,(s;) = sl Kisi, us = Lis:, 
where 


Ky — Q, 
K; = Q +A' (Kı — Ki41B(R+ B'K;,,,B) 'B'K,41)A, t= N—1,...,0, 


and 


L; = —-(R+ B'K 1B) !B' K; pA, t= N-1,...,0. 
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We already showed that the above holds for t = N — 1. Assume that it holds for 
t+1. At stage t we have the Bellman equation 


Jt (s+) 


= min {s] Qs: “ig ul Ru; T Je4i(St41)} 
t 


= min {s/ Qs; a ul Ru; T (As; +r Bu) Ki (As: e Bu,)} 


= min {s1 Qs; +s; A'K,4,As; + 287 A'K; Bu; + ul (R + B'K;4iB)u;}. 
The latter is a convex quadratic function of u+. To find its minimum, we compute 
its gradient and equate it to zero to obtain: 

2B'K,., As; + 2(R + B'K,,,B)u; = 0. 
Thus, the optimal control at stage t is 
uy = —(R + B'K,,,B)'B'K,,, As; = Lis;, 
where 
L; = -(R + B'Ka1 B) !B' K; A. 

Plugging this value of už in the above expression for J;(s;) we get 


Ji (s¢) = = 8; TQs; + S4 TAT K As; — S TATK,,,B(R + B'K,,,B)~ 'B'K,,, As; 


= S4 TK;s:, 
where 
K,=Q+A'(Ki,; — K1 B(R + B' K; 1B) 'B'K;,,,)A. 
13.5 Sequential Decision Problem with Infinite Horizon 


Infinite horizon problems are often appropriate models for problems where there 
is no terminal stage, such as investments for an endowment or a foundation. They 
are also often appropriate to model problems with very long time horizons. The 
infinite horizon setting tends to simplify some issues since the dependence of the 
value function on t can be eliminated. 

Consider an infinite horizon problem whose law of motion is of the form 


Si41 = f (Xt, St) 
and whose objective function is 


max 2 o. g(Xt; St), 
X0,X1, 


where 0 € (0,1) is a given discount factor. Define the value-to-go function V(-) as 


V (so) = max 3 0° - g(X4, St). 
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Observe that at any intermediate stage t we have 


V(s;) := max See - g(X7,S,7). 
at 


Xt Xt+15-- 
Thus, in this case the Bellman equation can be written as 


V(st) = max g(x, St) + 0- V (St41)- 


Linear-Quadratic Regulator with Infinite Horizon 


Consider now the infinite horizon version of the linear-quadratic regulator that 
we discussed in Section 13.4. The goal now is to determine the optimal sequence 


of controls uz, t = 0,1,..., that minimizes the following cost: 
lo) 

V(so) := min [Seras + arus) : 
Uug,Uj,... t=0 


A common technique to solve the Bellman equation (and similar differential 
equations) is “ansatz”, which can be loosely described as “make an educated 
guess and later verify”. In this problem, we try the following quadratic ansatz 
for the form of the value function: 


V(s,) = sl Ks, 


for some symmetric positive definite matrix K. 
With this educated guess we now apply the Bellman equation (infinite horizon 
case): 


V(s:) = min [s] Qs, + u; Ru; + V (s++1)] 


= min [s] Qs; + u; Ru; + (As; + Bu,)'K(As; + Bu;)| 


= min [s] Qs; +s/ A'KAs, + 2s] A'KBu, + u/ (R + B'KB)u;] . 


The latter is a convex quadratic function of u+. To find its minimum, we compute 
its gradient and equate it to zero to obtain: 


2B'KAs; + 2(R + B'KB)u; = 0. 
Thus, the optimal control at stage t is 
u; = —(R+ B'KB) 'B'KAs;, = Ls;, 


where 


L = -(R + B' KB) 'B' KA. 
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Plugging this value of už in the above Bellman equation we get 
V(s:) =s; Qs; +s} A'(K — KB(R + B'KB)'B'K)As;. 
Hence for the above guess to be correct, we must have: 
K =Q+A'(K—KB(R+B'KB)'B'K)A. 


This is the so-called Ricatti equation. Under suitable assumptions on Q, R, A,B, 
this equation is known to have a unique symmetric positive definite solution K. 

Consider the following special case: A = B = I and R = AQ with à > 0. In 
this case the law of motion is 


Si+1 = Sz + Ug 
and the Ricatti equation is 
K=Q+K-—K(\Q+K)"'K. 
We thus obtain 
Q = KAQ +K) ŻK. 


To solve for K, try to find a solution of the form K = aQ. Plugging this in the 


above equation yields 
~ A+a 


This is a quadratic equation in a with two roots, but only one that is positive, 


1 


namely 
1+V1+4. 
a= — 5 
Therefore we get 


1+v1+4\ 


K=aQ= 5 


Q, 


and consequently 
1+v1 +4 
2A +1+vVI+4A 


In particular, the optimal control at time t is 


aare he EA 
t O 2A+1+VI FIA 


Note that when A = 0, there is no direct cost associated with the control variable 
u; and therefore it is optimal to select u, to minimize the cost of S441 = S+ + 
u, which is given by $141 QS141. Clearly, this is minimized when s;,, = 0, or 
equivalently, when u, = —s;. On the other hand, for A > 0, the cost Aul Qu; 
keeps u; from reaching all the way to —s;. Instead, u; is a scalar multiple of —s,, 
where the scalar multiple is less than 1. In addition, the larger À, the higher the 
cost of the control variable u,, and therefore the smaller this scalar multiple. 
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Model of Sequential System (Stochastic Case) 


The above dynamic programming machinery has a straightforward extension to a 
more general context that includes a stochastic component in the law of motion. 

A stochastic sequential system is an extension of the deterministic case. Like a 
deterministic sequential system, the main components of a stochastic sequential 
system are stages, states, decisions, and law of motion. The first three are exactly 
as before. On the other hand, the law of motion of a stochastic sequential system 
is of the more general form 


Si41 = fe(St, Xt, Wt), t=0,1,...,7. 


As before, s;, x; are the state and action at stage t and s¢+1 is the state at stage 
t+ 1. In addition, w is some random disturbance that occurs at stage t. 
Assume we are interested in optimizing some overall objective function 


T 
x E JtlSt, Xt, wt) + grlern)] ; (13.5) 


t=0 


where each g;(s¢,Xz,W1), t = 0,1,...,T, and gr+1(Sr+1) is a cost or a reward 
per stage. This defines a stochastic sequential decision problem: find x, t = 
0,1,...,T, to minimize or maximize the expected total cost or reward (13.5). 

Bellman’s optimality principle also extends in a natural fashion. Suppose we 
are maximizing the expected reward 


T 
J (so) = max J 5 gt (St, Xt, We) erator] $ 


X0; XT 
t=0 


Consider the “tail problem” that starts at stage t: 


Xz; XT 
T=t 


T 
Ji(s+) == max 5 > gr (Sr, Xr, W7) + onetere) : 


Bellman’s optimality principle can be stated as follows. The value-to-go functions 
Ji(s+) satisfy the following Bellman equation: 


Ji (sz) = max Ey [g¢(se, £t, wt) + Sera (fe(se, vt, 2))] - (13.6) 


The stochastic case also has an infinite horizon version. Consider an infinite 
horizon problem with a law of motion of the form 


St4+1 = f (Xt, St, we) 


and objective function 


co 
` t 
max E > O- g(Xz, Swt) |, 
X0,X1, t20 
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where 0 € (0,1) is a given discount factor. 
Define the value-to-go function V(-) as 


X0,X1,--- 


V(so) := max E È ot ats) ; 
t=0 


Observe that at any intermediate stage t we have 


V(st) = a. J g7-t : T Sr, Wr 1 
( i) Ae AE B g(x W | 


In this case the Bellman equation can be written as 


V(si) = max ue (9(Xt, Stw) +0- V (st+1)]- 


Notes 


Dynamic programming was introduced by Bellman (1954, 1957), who stated 
the fundamental principle of optimality. Dynamic programming is pervasive in 
many disciplines, including finance, economics, biology, management, etc. The 
book of Bertsekas (2005) is a popular modern reference on this topic. The book 
by Porteus (2002) gives a treatment on dynamic programming with focus on 
inventory theory. 


Exercises 


Exercise 13.1 Consider the following puzzle. There are 40 matches on a table. 
You begin by picking up 1, 2, 3, or 4 matches. Then your opponent must pick 1, 
2, 3, or 4 matches. The two of you continue taking turns until the last match is 
picked up. The player who picks up the last match loses. 


(a) Can you find a strategy that guarantees your victory? If so, how? 

(b) What if the initial number of matches is 39, 38, 37, or 36 instead of 40? 

(c) Suppose the game starts with 40 matches and you and your opponent take 
turns as above but the player who picks up the last match wins. Can you 
find a strategy that guarantees your victory? If so, how? 


Exercise 13.2 Consider the following capital budgeting example from 
Chapter 8: 
max 92, + 11x + 7x3 + 424 
Sits Tzi T 10x2 kr 6x3 T 324 < 19 
x; € {0,1}, TE rere 


Observe that this is a knapsack problem. Prove that the vector x* = [0 1 1 1] z 


is an optimal solution to this problem by showing that it satisfies the Bellman 
equation. 


13.9 Exercises 223 


Exercise 13.3 Consider the optimal consumption problem described in 
Example 13.3. Suppose the consumer has a logarithmic utility of consumption 
U(C;) = log(C;). 


(a) Show that the optimal consumption and value-to-go function at stage T are 
respectively C7.(Wr) = Wr and Jr(Wr) = log(Wr). 

(b) Assume r = 0. Use the Bellman equation and induction to show that the 
optimal consumption and value-to-go function at stages t = 0,1,...,7-—1 
are respectively C;(W;) = Wo/(T — t + 1) and (Wz) = (T — t+ 1) log(W;). 

(c) Assume r > 0. Use the Bellman equation and induction to find the optimal 
consumption and value-to-go function at stages t = 0,1,..., T — 1. 


Exercise 13.4 Consider the following infinite horizon variation of the previous 
consumption problem. Assume the consumer lives forever and her objective is to 
maximize the following total discounted utility of consumption 


>D 6 - log(C;) 
t=0 


for some 8 € (0,1). Use the following educated guess (“ansatz”) for the optimal 
value function: 


V(W:) =a- log(W;) +b 


for some constants a,b with a > 0. Use the Bellman equation to verify this 
educated guess and determine the optimal consumption rule C¥ (W+) and optimal 
value function V (W+) (that is, the values of a and b). For simplicity, assume r = 0. 


Exercise 13.5 Consider the following variation of the optimal consumption 
problem described in Example 13.3. At each stage t the amount of non-consumed 
wealth W, — Ct can be split between treasury bills and an index fund. Funds 
placed in treasury bills earn an annual risk-free return r > 0 whereas the funds 
placed in the index fund earn a risky return r; with expected value E(r;) =u >r 
2 > 0. Assume the returns are i.i.d. across different 


and variance var(r;) = o 
periods. 

Use dynamic programming to formulate the following optimal investment and 
consumption problem: Determine the consumption C; € [0,W;] and fraction of 
wealth xz, € R invested in the index fund at stage t = 0,1,...,7 that maximize 
the total expected utility of consumption 


T 
Ue 


over the next T years. Proceed as follows. 


max E 
Osis, Cr 
ao i 


(a) Write the law of motion; that is, the equation that describes the state W:+1 
in terms of the state W, and decisions Ct, x; at stage t. 
(b) Write the Bellman equation for the value-to-go function J;(W;). 
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(c) Consider the special case of logarithmic utility of consumption U (C) = 
log(C;). Use the Bellman equation and induction to determine the optimal 
consumption Cf and investment fraction x7 as well as the value-to-go func- 
tion J;(W;) at stage t for t = 0,1,...,T. 

(d) Indicate how your model changes if the fraction of wealth x; invested in the 
index fund at each stage ¢ is subject to the constraint x; € [0,1]. (That is, 
no leverage is allowed in the investment portfolio.) 
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Dynamic Programming Models: 
Multi-Period Portfolio Optimization 


This chapter describes four types of dynamic portfolio optimization problems 
that are amenable to dynamic programming technology. The first two deal 
respectively with optimization of final wealth and its extension, optimal 
consumption and investment. These two classical models date back multiple 
decades. The last two problems are much more modern developments. One of 
them is a model for dynamic trading when returns are predictable and trading 
is costly. The other one is a model for dynamic portfolio optimization that 
incorporates capital gains taxes. 


Utility of Terminal Wealth 


Let us revisit the dynamic portfolio optimization model with initial endowment 
and utility of terminal wealth that we discussed in Chapter 12. However, this 
time we consider a more general setting where there are forecasting variables 
available at each stage. Such forecasting variables could be associated with a 
factor model. For instance, they could be macroeconomic indicators, or certain 
measurable parameters of a particular asset or firm. 

As before, suppose that an investor starts at t = 0 with an initial endowment 
Wo. At times t = 0,...,T — 1 the investor invests her wealth W; in a portfolio of 
risk-free and risky assets. The investor’s goal is to maximize the expected utility 
of terminal wealth U(Wr) at time T for some utility function U(-). Define the 
following convenient notation: 


Ry++1 = gross risk-free return in period [t,t + 1]; 

r¿+1 = vector of excess returns of the risky assets in period [t,t + 1]; 
Rp +1 = gross random return of the investor’s portfolio in period [t,t + 1]; 
Zt, = forecasting state variables available at stage t; 

W, = wealth at stage t. 


We have the inter-temporal budget constraint: 
Wii = Wi e Ry t41 = W, $ (Rft+1 + Bie) t= 0, EE T —1. 
The specific components of this sequential decision problem are as follows: 


Stages: these are t = 0,..., T- 1. 
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State at stage t: this is (Wz, zz). 

Decision variables at stage t: these are the vector x, of portfolio holdings 
(percentages) in the risky assets. 

Law of motion: this is the same as the above inter-temporal constraint 


Wisi = Wi- (Rft+1 + rlx), t=0,.... T-—1. 


We next apply Bellman’s optimality principle. In this case the value-to-go 
function is 


(Wi, zt) = max E(U(Wr)) 


Xz; XT—1 
T-1 
U (w [[ Rim +Tax) 


T=t 


= max Ut 
Xt; XT—1 


At the final stage T we get 
Jr(Wr, zr) = U (Wr). 


For earlier stages, we have the Bellman equation 


JI(Wi, z4) = max E; [Jr41(We41, Zt+1 )] 


= max M [Je } 1(Wi(Rft+1 + rh (eH) Zi+1)| 4 


In the special case of power utility U(W) = W!~7/(1— y), where y > 0, we 
rewrite the Bellman equation as follows. Define p(z) := J:(1,z¢). Then it is 
easy to see that the Bellman equation is equivalent to 


e(ze) = max Ey [(Rf t1 +0 ppixe)* 7 - Vera (Ze41)] - 


We can draw the following interesting conclusions from here. On the one hand, 
if ry41 and z,41 are independent at time t, then the term on the right-hand side 
above satisfies 


oe [(Rf tti + rpa) T > esi (Zea) 
= E; (Rfi + regi xt) 7] Er (hepi (Zt+1)). (14.1) 


Thus, to find x; we need to solve 


[Reser + text)? : 
max Ih (Ry I trž) = max E; [U (Rp,t+1)] - 
Xt re | Xt 


In this case the optimal policy is myopic. 

On the other hand, if r;+ı and 2,4, are correlated, then (14.1) no longer holds. 
In this case x, may include some kind of “inter-temporal hedging component”. 
The intuition is that the correlation between r¿}ı and z,;; would induce some 
kind of serial dependence in our returns. In other words, the current forecasted 
return rı conveys information about future returns. Unlike the myopic strategy, 
the optimal dynamic strategy incorporates this serial dependence. 
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Optimal Consumption and Investment 


Consider an extension of the previous dynamic portfolio optimization model 
where the goal is to maximize an expected utility that combines two terms: 
consumption along the planning horizon and terminal wealth. The latter com- 
ponent is sometimes called bequest. 

There are three key differences from the previous model. First, there is an 
additional decision variable C; € [0, W+] at each stage t that denotes the amount 
of wealth the investor consumes at stage t. Second, the objective function is 


T-1 
gua % b> U(C:) + Biwo) 


t=0 


for some utility functions U (C) and B(W). Third, the new law of motion, or 
inter-temporal budget constraint, is 


Wear = (Wi — Ct) Rp1 = (We — Ce) - (Reg +rhax), t=0,... T1. 


To simplify our discussion we consider the case when there are no forecasting 
variables z;. In particular this implies that the returns on the risky assets are 
independent across different time periods. At the final stage T we have the 
following value-to-go function 


Jr(Wr) = B(Wr). 


For earlier stages, we have the Bellman equation 


(Wi) = max E; [Je41(Wi41) +U (C¢)] - 
Ct Xt 


The first-order optimality conditions yield 
U'(Ct) =E, [I 1(Wi+1)Rp,t+1] 


and 


Ut [Fe (We4i)re41] = 0. 


The first one is obtained by differentiating with respect to Cy and the second 
one is obtained by differentiating with respect to x+. 

If we plug the optimal C;, x; back into the Bellman equation and differentiate 
with respect to the state variable W+, we obtain the following envelope condition: 


U' (Ci) = JW). 


In the special case of a logarithmic utility of consumption and bequest U(C) = 
log(C), B(W) = log(W), we can draw a more explicit conclusion about the 
problem. In this case the Bellman equation yields the following expressions for 
the value function and optimal consumption: 


JW) = 
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and 

W: 

C% (W) = =~. 
eW) = FG 

The specific value b, and the optimal portfolio x7(W;,) depend on the joint 

probability distribution of Rez, and r,41. By contrast, the optimal consumption 


Cy(W;) only depends on Wj. 


Dynamic Trading with Predictable Returns and 
Transaction Costs 


We next discuss a recent model due to Gârleanu and Pedersen (2013) for dynamic 
portfolio optimization when asset returns are predictable by signals and trading 
is costly. This problem is quite timely and especially relevant for active investors. 
The optimal trading policy should balance various tradeoffs. Fast trading gen- 
erates more alpha and lower risk but also higher transaction costs. Slow trading 
does the opposite. On the other hand, there may be fast signals that require 
quick action and slow signals associated with longer-lasting alpha. The model 
that we discuss next provides an insightful solution to this problem. 

Consider a universe of assets, whose returns evolve according to the following 
law of motion: 


r1 = Bh + urs. 
Here f; is a vector of factor returns that predict asset returns, B is a matrix 


of exposures or sensitivities of the asset returns to factor returns, and u; is an 
idiosyncratic zero-mean noise term with constant covariance matrix 


var; (Uz+1) = 


The vector of factor returns f; is known to the investor at time t and evolves 
according to 

Afi = —®f; + €¢41, 
where Afi+ı = fi+1 ma f;. 


Trading is costly. The transaction cost associated with trading the vector of 
shares Ax; = X; — X;_1 İS 


TC(Ax,) = Ax] A Ax; 


for some symmetric positive definite matrix A. 
The model objective is 


inde y| ma eaa a a EAA 
0 5 p) Pep1Xt — gX Ue 5 X; Xt 


XO,X1)--- = 


Gârleanu and Pedersen (2013) apply a dynamic programming approach to 
characterize the optimal trading strategy. We summarize the main results below. 
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The state at time t is the pair (x1, f+). The value-to-go function is 


V(x:-1,f;) = max E; yu —py rt (ie — ix] Bx, 


Xt Xt+15--- 
T=t 


= Tt 
— On py A Ax : 


Hence the Bellman equation is 


1 
V (x-1, f4) = max {3x7 Ax: 


Xt 


+(1-p) ( (rlx) — Fx] Ex + Ey [V (x, fanl) } : 


We make an educated guess and later verify (ansatz) the following quadratic 
form for the value function: 


V (xe, fee) = 4x} Arex +X} Any fee + Sf, Appia + ao. 
Using this ansatz, it can be shown that the optimal trading policy is 
Xp = X41 + A` Age (aim: — x1) 
where 
aim, = Al Acs ft. 


The Bellman equation also yields expressions for the matrices Agr, Agf, Aff. 
See the exercises at the end of the chapter. In the special case A = AX we obtain 


a a. 
xX = (1 — J X¿—1 + ` aim, 


where 


(a= p) + Ap) + Val = pP) +0)? +AA = p)? 

2(1 — p) l 
Next, we get a more explicit expression of the aim portfolio. To that end, first 
observe that the myopic solution in the absence of transaction costs is precisely 
the solution to the static Markowitz model at time t; that is, 


Markowitz, = (y£) !Bf;. 


Again we consider the special case A = AX. For z := y/(y + a) we get 


aim, = z- Markowitz; + (1 — z)E,(aim+1) 
Co 


= 5 z(1— z) *E;(Markowitz,). 
T=t 


Furthermore, the portfolio aim; has a similar form to Markowitz, provided the 
forecasting signals are appropriately scaled down: 


-1 
aim, = (y5)~'B (x -- 2o) f;. 
Y 
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The optimal strategy is characterized by two principles. First, aim in front of 
the target. Second, trade partially towards the current aim. More precisely, the 
optimal updated portfolio is a linear combination of the existing portfolio and an 
aim portfolio. The latter is a weighted average of the current Markowitz portfolio 
(the moving target) and the expected Markowitz portfolios on all future dates 
(where the target is moving). 


Dynamic Portfolio Optimization with Taxes 


Taxes pose a significant friction to most investors in financial markets. There 
are a variety of taxes that apply in different ways to income, dividends, and 
capital gains. It is common to ignore taxes in traditional finance and portfolio 
theory. This simplification is in part due to the difficulties involved in modeling 
the effects of taxes. 

Capital taxes introduce a peculiar type of challenge in portfolio management. 
Since the sale of an appreciated asset triggers a capital gain tax liability, there is 
a tradeoff between the benefits of diversification versus the tax costs triggered by 
rebalancing the portfolio. In addition to the tradeoff between diversification and 
taxes, many individual investors also have to deal with both a tax-deferred and 
a taxable account. In this context an investor faces an asset location problem 
in addition to the usual asset allocation problem. Asset location refers to the 
problem of how the investor should locate her portfolio holdings across the tax- 
deferred and taxable accounts. 


Basic Case: Tax Management Only 


In the United States tax code, capital gains and losses are triggered when assets 
are sold. This feature means that the investor could manage her assets in ways 
that reduce her tax liabilities by choosing when to realize gains or losses. In this 
section we describe some models for optimal tax trading. 

One of the earliest and most basic models for optimal tax trading was intro- 
duced by Constantinides (1983). In this model it is assumed that the tax rate 
on capital gains is independent of the length of the holding period. It is also 
assumed that capital losses generate tax rebates. Finally, it is assumed that 
there are no transaction costs, no capital loss restrictions, and no wash-sale 
restrictions. A wash sale occurs when an asset is sold at a capital loss and the 
same or substantially identical one is also purchased within 30 days before or after 
the sale. Under these assumptions the optimal tax-trading strategy is relatively 
simple: Realize losses as soon as they occur and defer gains indefinitely. By 
realizing losses, the investor gets a tax rebate. If the investor did not realize the 
loss as soon as it happened, the opportunity for a tax rebate could disappear. 
Constantinides’s model can be extended to account for proportional transaction 
costs. If there are proportional transaction costs, then the optimal tax-trading 
strategy would still be to defer gains but to realize losses only beyond a certain 
threshold. The exact size of the threshold depends on the size of the transaction 
costs, the tax rate, and the asset’s volatility. 
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In a more elaborate follow-up article Constantinides (1984) proposed a model 
that considers a more realistic setting where the tax rate depends on the length 
of the holding period. In this model the sale of assets with long-term status is 
taxed at a rate lower than that of assets with short-term status. In this case the 
optimal tax-trading strategy still calls for realizing losses as soon as they occur. 
In addition and somewhat surprisingly, it is also sometimes optimal to sell (and 
immediately repurchase) assets with an embedded long-term gain. The rationale 
for this action is that there is a “re-start” option associated with resetting the 
tax basis and having the opportunity to realize short-term losses. The value of 
this re-start option depends on the asset volatility and the ratio of the short-term 
and long-term capital tax rates. The following example provided by C. Spatt 
illustrates this phenomenon. 


Example 14.1 Consider an asset with current price P) = $20. Suppose that 
at dates t = 0,1 we have 


poa P,+k with probability 0.5 
‘tl | P,—k with probability 0.5. 


Assume an investor buys one share of this asset at date t = 0. Our goal is 
to determine the trading strategy (realize/not realize) at dates t = 1,2 that 
minimizes expected taxes. 


(a) First consider the following case. The short-term and long-term capital gain 
tax rates are respectively Ts = 0.5, and Te = 0.5y, where 0 < y < 1. The sale 
of shares held for one period can be treated as either short-term or long-term 
depending on what is more advantageous to the investor. The sale of shares 
held for two periods is treated as long-term. Assume there are no transaction 
costs. 

In this case at dates t = 1 and t = 2 it is optimal to realize losses. 
At date t = 1 it is optimal to realize a long-term gain if y < 0.5. See 
Exercise 14.2. 

(b) Now consider the case when the capital gain tax rate is r = 0.2 for both 
long-term and short-term gains or losses. Assume a transaction cost of 0.5 
per share traded. 

In this case at date t = 2 it is optimal to realize a loss if k > 1.25. 

At date t = 1 it is optimal to realize a loss if k > 5. 

Since short-term and long-term rates are the same, it is optimal not to 
realize gains at any date. 


Portfolio Choice with Taxes 


We now turn our attention to the problem of dynamic portfolio choice in the 
presence of capital gains taxes. The model below is a simplified version of a 
model proposed by Dammon et al. (2001). 


1 tie 
Personal communication. 
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We consider an economy with a risky and a risk-free asset where investors 
live for T periods. We also assume that in this economy investors are endowed 
with some initial capital and their goal is to maximize some expected utility of 
consumption C; at dates t = 0,1,...,7 and bequest Wr at date T. The return of 
the risk-free asset between date t — 1 and date t is r. The price of the risky asset 
is serially independent and follows a binomial process. Let P, denote the price of 
the risky asset at date t. Let n, and m; denote respectively the number of shares 
of the risky and risk-free assets held right after trading at date t. Throughout 
the model we will assume no shorting, i.e., we will impose the constraints n; > 0 
and mų > 0. 

We assume that capital gains are taxed at a rate 7, and capital losses are 
credited at the same rate. To compute the capital gain triggered by an asset 
sale, we assume that the tax basis P;* for the shares at date t is the weighted 
average price of those shares. Therefore, the tax basis P;* evolves according to 
the following law of motion: 


-1 Pa +(m—mi)7-P, 

f Nt—1 t—1 (n Nt 1) t if PX, <P, 
Py = me—-1 + (Me — ni1)" 

P, if PX, > Pe 


Right after trading at date t, the realized capital gain or loss G; is given by 


Gi = { ey 3 mi) (Pe E Pia) iPr, 
: ne—-1(P, — Py) if Pi > P. 


We have the following inter-temporal balance of wealth equation that relates the 
portfolio holdings at dates t— 1 to the portfolio holdings at dates t = 1,..., T — 1: 


nP + mM + Ci = ne—1P + me_1(1 + r) = TG. 
Similarly, at date T we have 
Wr = np_1Pr + mr_1(1 + r) om TGT. 


The portfolio choice problem can be stated as a dynamic programming problem 
where the state variables at date t are (P;, P*_1, Nnt—1, Mą—1), the actions at time 
t are (n, Mmt, C+) and the objective is 


max 
Ct, Nnt, Mt 


T 
> U(C.,t) + B(Wr) 
t=0 


The following example illustrates the striking effect of taxes in portfolio choice. 


Example 14.2 Assume that at date t = 0 the holdings are n_, > 0 and 
m_ 1 = 0. In other words, our entire portfolio is invested in the risky asset. 
Assume r= 0, P = 1,0<1—k< P*, =1-—6 <1, and 


pP- Po +k with prob 1/2 

1". Po — k with prob 1/2. 
Assume T = 1 and our goal is to determine the portfolio holdings at date t = 0 
so as to maximize some utility of final wealth E(U(W))). Since there is no 
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consumption at date t = 0 we have 
noPy + mo = n-1Po +m_1 — 7(n_1 — no)t (Po — P*,). 
Thus 
Mo = (n_1 — No)(1 — rô). 


And so the only variable in our problem is no subject to the constraints 0 < 
no < nı. 
Furthermore, the balance of wealth equation yields 


Wi = no Pa + mo + TNo(P5 Pas P) 


_ f no(td+k)+n1(1—76) with prob 1/2 
~ | no(tk—k)+n_1(1—76) with prob 1/2. 


It is evident that in the absence of taxes (r = 0), the optimal holding of the 
risky asset is ng = 0 for any positive level of risk aversion. However, it is easily 
checked numerically that no may vary all the way between 0 and n_ for positive 
values of r. In particular, for r = 0.2, 6 = 0.1, k = 0.2 we get no = 0.8352 for 
the logarithmic utility function. 


The numerical solution to the more general model in Dammon et al. (2001) 
reveals the following interesting insights. As expected, it is optimal to realize 
capital losses as soon as they occur. Since diversification is more valuable to 
young investors, it is optimal for them to sell assets with large embedded capital 
gains to rebalance their portfolios. On the other hand, elderly investors defer 
most capital gains. Because in the US tax code there is a tax forgiveness at 
death, it is optimal for elderly investors to increase their allocations to equity as 
they approach their terminal age. 

If in addition to some initial endowment an investor receives income, then 
the following insights are again revealed by a numerical solution to the model. 
Young investors hold more equity, very much in line with popular financial 
planning advice. Because of capital gain taxes, it is optimal to use income to 
adjust asset allocation instead of selling assets with embedded capital gains. In 
years immediately prior to retirement it is optimal to reduce equity allocation, 
again in line with popular financial planning advice. Finally, beyond retirement 
it is optimal to have a gradual increase in equity holdings. 


Asset Allocation and Asset Location 


The availability of various kinds of tax-deferred retirements accounts such as 
401K, 403(b), IRA, and Keough give investors the ability to shelter some of their 
assets from taxes. Since assets may be held both in a tax-deferred as well as in 
a taxable account, the location decision has important implications on portfolio 
choice. Dammon et al. (2004) developed a model to study this problem. Via 
an arbitrage argument, they show that it is optimal to allocate assets to the 
tax-deferred account in descending order of tax exposure until the limit of the 


234 


14.5 


Dynamic Programming Models: Multi-Period Optimization 


tax-deferred account is reached. In particular, the assets with the highest taxable 
yields, such as taxable bonds, should go in the tax-deferred account. If the limit of 
the tax-deferred account is reached, then assets with lower taxable yields should 
be allocated to the taxable account. 

The numerical solution to the model in Dammon et al. (2004) also shows 
that, the larger the fraction of wealth in the tax-deferred account, the higher the 
fraction of total wealth allocated to assets with higher taxable yield. 


Exercises 


Exercise 14.1 Consider the optimal consumption and investment model dis- 
cussed in Section 14.2. 


(a) Prove the envelope condition by proceeding as follows. Let Cf (W;) and 
xi(W,) denote the optimal consumption and optimal portfolio at stage t 
respectively. Thus 


(Wa) = Ez [Jess (We — Cf (We) (L + rg xt (W2))] + U (CY (W))] . 


Use the chain rule to differentiate both sides above with respect to W;. Then 
use the optimality conditions for the Bellman equation to show that the 
expression for the derivative of the right-hand side simplifies to U’(C7'(W:)), 
thereby giving the envelope condition 


Ji(W;) = U' (C (W:)). 


(b) Consider the special case U (C) = log(C), B(W) = log(W). Use induction 
to prove the following expressions for the value function and optimal con- 


sumption: 
_ log(W;) 
(W) = caer + by 
and 
Wi 
CHW yoo + 
i (Wi) Toe 


where b, depends on the joint distribution of Re t+1, rt+1- 


Exercise 14.2 Consider the model for short-term versus long-term taxes 
described in Example 14.1. 


(a) Suppose the short-term and long-term capital gain tax rates are respectively 
Ts = 0.5, and Te = 0.5y, where 0 < y < 1. 
Prove that indeed at dates t = 1 and t = 2 it is optimal to realize losses, 
and at date t = 1 it is optimal to realize a long-term gain if y < 0.5. 
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(b) Suppose the capital gain tax rate is r = 0.2 for both long-term and short- 
term gains or losses and there is a transaction cost of 0.5 per share traded. 
Prove that at date t = 2 it is optimal to realize a loss if k > 1.25, and at 
date t = 1 it is optimal to realize a loss if k > 5. Prove that it is optimal not 

to realize gains at any date. 


Exercise 14.3 Consider the model for portfolio choice with taxes described in 
Example 14.2. 


(a) Prove that for T = 0 then the optimal holding of the risky asset at t = 0 is 
no = 0 for any risk-averse concave utility function U (W). 

(b) Check numerically that for r = 0.2, 6 = 0.1, k = 0.2, and logarithmic utility 
function U(W) = log(W), the optimal holding at t = 0 is no = 0.8352. 


Exercise 14.4 


(a) Consider the following minimum-variance portfolio optimization problem 
with risky assets, transaction costs, and no constraints: 

1 

2 ( 

Here xo is some initial portfolio, V is the covariance matrix of asset returns, 

and R is a symmetric positive definite matrix that models the transaction 


cost incurred in changing the initial portfolio x9 to the new portfolio x. 
Prove that the solution to (14.2) is 


x* = (V+R)7!Rxp. 


x 


min {5x°Vx + —(x xo) R(x — xa). (14.2) 


(b) Now consider a multi-period version of the previous problem. Assume the 
investor starts with an initial portfolio xq and her objective is 


l 1 1 
min X G x; Vx + 3 (Xt — X11) R(x: — x«-1)) l 


Apply dynamic programming to solve this problem. Proceed as follows: 

the stages are t = 1,..., T; 

the state at stage t is x;_1, that is, the portfolio previously set at stage 
t—1; 

the action at stage t is the vector of holdings xz; 

the cost at stage t is the quadratic term 


1 
5 xl Vx; + z% = x1) R(x; — X1). 


(i) Show that the optimal optimal decision rule x4, and the value-to-go 
function Jr(xr-1) at the last stage T are 


x7 = (V + R) İRxr1 


and 
1 


Jr(xr-1) = 3*T-1KrXr-1, 


where Kr = R—R(V+R)7!R. 
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(ii) Use the Bellman equation and induction to prove that the optimal deci- 
sion rule xj and the value-to-go function J;(x,_1) at each stage t = T—1, 
T —2,...,1 are of the form 


* 
ap LiXi_-1 


and i 
Ji(Xt-1) = 5x Kia 
for some suitable matrices L, and K+. 
(c) Now consider an infinite horizon version of the previous problem. Assume 
there is only one risky asset, the investor starts with an initial portfolio £o 


and her objective is 


i q r 
Cone >, G xy + g(t > w1)?) X 


love) 
t=1 


Use the following educated guess (“ansatz”) for the optimal value function: 


k 
V (z1) = zri (14.3) 
for some constant k > 0. In other words, our educated guess is that the value 


function starting at stage t with state z;_,, namely 


co 
Vor pa, (Eeee), 
is of the form (14.3) for some constant k > 0. 

Use the Bellman equation to verify this educated guess, determine the 
optimal decision rule x}, and find the value of k in terms of q,r. 

Give expressions for both k and the optimal portfolio xf that are as explicit 
as possible. 


Exercise 14.5 


(a) Consider the mean-variance portfolio optimization problem with risk-free 
asset and no constraints: 


max {ux — T xTVx} (14.4) 


Here p is the vector of excess returns, V is the covariance matrix, x is the 
vector of holdings in risky assets, and y > 0 is a risk-aversion constant. 
Prove that the solution to (14.4) is 
5 ly- H- 
Y 
(b) Now consider a variation of the previous problem that includes a quadratic 
term for transaction costs: 


max fux 2 xTVx À (x x0) V(x- xo)}. (14.5) 


Here xo is some initial portfolio and A > 0 is a transaction cost constant. 
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Prove that the solution to (14.5) is 


1 
x* = —— V`lu+ 
pe? EERE 


* 


Xo. 


(c) Now consider a multi-period version of the previous problem. Assume the 
objective is 


This assumes that the vector of expected returns may vary with time but not 


the covariance matrix. Apply dynamic programming to solve this problem. 
Proceed as follows: 

e the stages are t = 1,..., T; 

e the state at stage t is (X+—1, My); 

e the action at stage t is the vector of holdings xz; 

e the reward at stage t is the quadratic term 


(i) 


À 
uix: = Z xi Vx; = 5 (x = x1) V(x: — X1). 


Show that the optimal decision rule x} and the value-to-go function 
J7(Xr_1, Hr) at the last stage T are 

Xp = aV~ lpr + bxr_1, 
and 
c 
2 


for some constants a,b, c, d, e. 


z d 
Jr(xrT-1, Hr) = urV lur Sp z xp- VXxr-1 + e UFXT-1, 


* Assume u, has the following (simple and somewhat unrealistic) law of 
motion: 
Hizi = B+ p(My, — B) + €, 


where ñ, p are constants, |p| < 1, and €; is a vector of independently 
normally distributed random shocks each with mean 0 and variance 1. 
Use the Bellman equation and induction to prove that the optimal deci- 
sion rule xj and the value-to-go function J;(x,_1, 44) at each stage t = 
T-—1,T—-2,...,1 are 


* -1 
x = aV Hi + beXt_-1, 


and 


di 
2 


for some constants at, bt, Ct, dt, €t, ft- 


c X 
St (Xt-1, Me) = zhi “pe + XL VX + em X + fi 
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Dynamic Programming Models: 
the Binomial Pricing Model 


One of the most common uses of dynamic programming in financial mathematics 
is through lattice models. In particular, the binomial lattice model of Cox et al. 
(1979) has become an indispensable tool for pricing derivative securities. This 
chapter describes this model and the underlying dynamic programming principles 
for the pricing of European options and the pricing and optimal exercising of 
American options. 


Binomial Lattice Model 


The binomial lattice provides a model for the price movements of a risky asset. 
It can be seen as a multi-period version of the single-period binomial model 
discussed in Section 4.3. The binomial lattice model describes the price of a 
risky asset at some discrete times 0,1,..., N. A basic period length, such as a 
week, day, or second, is assumed to elapse between any two consecutive times. 
The model assumes that if the share price of the risky asset is Sẹ at time k then 
the share price 5,41 at time k+1 can take two values, namely Sk}1 = u: Sp and 
Sk+1 = d: Sk where u > d > 0 are multiplicative factors (u stands for “up” and 
d for “down” factors). The probabilities assigned to these two possible states are 
p and 1 — p respectively, where 0 < p < 1. The multi-stage price structure can 
be represented on a lattice as illustrated in Figure 15.1. 

After k time periods, the asset price can take k +1 different values. If the price 
at stage 0 is So, then the price S% at stage k is u/d*—J So if there are j up moves 
and k — j down moves. Observe that there are e) possible paths to reach the 
node corresponding to j up moves and t— j down moves after t periods. Therefore 
the probability that the price is u/d*—J So in stage k is G) — p)*-I because 
between two consecutive times the probability of an up move is p whereas that 
of a down move is 1 — p. 


Option Pricing 


Using the above binomial lattice model for the price process of an underlying 
risky asset, the value of an option on this asset can be computed by dynamic pro- 
gramming by using backward recursion, working from the maturity date (time N) 
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0 1 2 3 Time 
Figure 15.1 Asset price in the binomial lattice model 


back to time 0 (the current time). The approach fits within the stochastic setting 
introduced in Section 13.7. More precisely, the stages of the dynamic program 
are the discrete times k = 0,..., N. The state at stage k is the asset price Sy. 
Thus S; can take the k +1 possible values defined by the k + 1 nodes in the kth 
layer of the lattice. The state Sy at the final stage N is the terminal state. The 
law of motion for the asset price is as follows: 


s _ | uS, with probability p 
4 dS; with probability q = 1 — p 


for k = 0,1,..., N. However, the following adjustment of utmost importance 
must be made: for option pricing purposes, we do not use these actual probabil- 
ities p and q = 1 — p but instead the risk-neutral probabilities p and ĝ = 1 — p 
as explained below. 


European Options 


Consider a European option contract that matures at time N with payoff g(Sw) 
for some function g(-) of the underlying asset price Sy. For instance, if the 
contract is a European call option maturing at time N with strike price K, the 
payoff at maturity is g(Sn) = (Sn — K)”. Similarly, if the contract is a European 
put option maturing at time N with strike price K, the payoff at maturity is 
g(Sn) = (K — Sn)*. 

Let V;,(.S;,) denote the value of the option at stage k when the asset price is Sx. 
This is the value-to-go function in our dynamic program. The value of the option 
at stage 0 is given by Vo(So). This is the quantity that we have to compute in 
order to solve the option pricing problem. At the final time N the value-to-go 
function is given by the payoff of the option contract. That is, 


Vn (Sw) = g(Sw). 


Since we are dealing with a European option, we can compute the value V;(-) 
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in terms of V;.41(-). The single-period subproblem between stages k and k +1 is 
identical to the single-period binomial model discussed in Section 4.3. Therefore, 
the value V;,(S;) can be obtained via the risk-neutral probabilities (4.4), namely 

~ l+r-d dé u— l-r 

= ——— an = —_ 

a u—d 4 u-d ’ 
where r is the one-period return on the risk-free asset between time k and time k+ 
1. Thus for European options the value-to-go functions V;,(-) can be recursively 
computed as 


ValS) = q PVisa(uSe) + Via (dSi)). (15.1) 


Example 15.1 Consider a binomial lattice model with N = 3,u = 2,d = 


$7 = 0.25, and Sg = 40 as depicted in Figure 15.2. Compute the price of a 


European call option with strike price K = 50. 


320 


0 1 2 3 Time 


Figure 15.2 Binomial lattice with u = 2,d = 0.5, So = 40, N = 3 


For these values of u,d,r the risk-neutral probabilities are 

0.75 1 

LS -2 

The option value V3(S3) = (S3 — 50)* at the final stage 3 is as follows: 


p=q= 


V3(5) = V3(20) = 0, V3(80) = 30, V3(320) = 270. 


Next, applying (15.1) we get Vo(S2): 


1 30 1 30+ 270 
10) = WH. =12, Vigna ei, 
Va(10) = 0, Va(40) = [a5 + > » Va(160) = To 5 0 
Applying (15.1) again we get Vi (S11): 
1 12 1 120412 
Vi (20) = 5 F = 4.8, Vi(80) = 5 = 528. 
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Finally applying (15.1) one more time, we get the option value Vo(So): 


= 1 48+452.8 
1.25 2 
Example 15.2 Consider a binomial lattice model with N = 3, u = 2, d = E, 


r = 0.25, and So = 40. Compute the price of a European put option with strike 
price K = 60. 


Vo (40) = 23.04. 


Again the risk-neutral probabilities are p = q = Z. We can proceed as in 
Example 15.1. The option value V3($3) = (60 — S3)* at the final stage 3 is 


V3(5) = 55, V3(20) = 40, V3(80) = V3(320) = 0. 
Next, applying (15.1) we get V2(S2): 


1 55+40 1 40 
iOa = AQ seen te oa 160) = 0. 

Applying (15.1) again we get Vi (S1): 

1 38+16 1 16 
Finally applying (15.1) one more time, we get the option value Vo(S0): 

1 21.6+46.4 
40) = — 4&2 TOt L12, 
Vo(40) 1.25 2 


American Options 


Consider now an American option contract that can be exercised at any time 
k =0,1,...,.N with payoff g(S;) for some function g(-) of the underlying asset 
price Sk. The key difference between this type of American option contract and 
the above type of European option contract is the possibility of early exercise. 
Because of this additional feature in the contract, the pricing problem of an 
American option needs to account for the optimal exercise timing of the option. 
This is accomplished via an adjustment to the previous recursion in the calcula- 
tion of the value function. 

Once again, at the final time N the value-to-go function is given by the payoff 
of the option contract. That is, 


Vn (Sw) = g(Sw). 
The computation of the value-to-go function V;,(-) in terms of V;z+1(-) needs to 
reflect the possibility of early exercise. To that end, the recursive formula (15.1) 
needs to be amended as follows: 


1 
Vi (Sk) = max { I 


+r 


RE E So} (15.2) 


In words, the value of the option at stage k is the maximum of the following two 
quantities: the first one is the discounted value of the option at stage k+1 or the 
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payoff obtained if the option is exercised immediately. When the latter is larger, 
it is optimal to exercise the option at stage k. 


Example 15.3 Consider a binomial lattice model with N = 3, u = 2, d = 4, 
r = 0.25, and Sp = 40. Compute the price of an American call option with strike 
price K = 50. 


Again the risk-neutral probabilities are p = @ = 5. The option value V3(S3) = 

(S3 — 50)* at the final stage 3 is 
V3(5) = V3(20) = 0, V3(80) = 30, V3(320) = 270. 

Next, applying (15.2) we get Vo(S2): 

1 30 
1.25 2’ 

1 304 270 
1.25 2 


V2(10) = 0, V2(40) = max { (40 — soy} =l, 


V2(160) = max { , (160 50+} = 120. 


Observe that regardless of the value of $2, it is not optimal to exercise the option 
at stage 2. 
Applying (15.2) again we get Vi (S1): 


1 12 
a ee ee T ace OE 
vı (20) =max { 55 z» (20 — 50) \ 4.8, 
7 1 1200+12 a 
V; (80) = max { a a — 50) \ = 52.8. 


Observe that regardless of the value of S1, it is not optimal to exercise the option 
at stage 1. 
Finally applying (15.2) one more time, we get the option value Vo(S0): 


1 52.8448 
1.25 2 


Vo(40) = max f , (40 50+} = 23.04. 


Observe that there is no difference in price and in exercise policy between the 
American and the European call options. 


Example 15.4 Consider a binomial lattice model with N = 3, u = 2, d = 4, 
r = 0.25, and Sp = 40. Compute the price of an American put option with strike 
price K = 60. 


Once again, the risk-neutral probabilities are p = q = Z. The option value 
V3(S3) = (60 — $3)* at the final stage 3 is 


V3(5) = 55, V3(20) = 40, V3(80) = V3(320) = 0. 
Next, applying (15.2) we get V2(S2) 


1  55+40 
10) = : thes 
Vo(10) max | ——, (60 — 10) \ 50, 
1 40 y 
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Observe that it is optimal to exercise the option at stage 2 when S2 = 10 and 
So = 40. 

Applying (15.2) again we get Vi (S1) 

1 50+ 20 
1.25 2 

1 20 
1.25 2’ 
Observe that it is optimal to exercise the option at stage 1 when 5; = 20. 

Finally applying (15.2) one more time, we get the option value Vo(So) 


1 40+8 
1.25 2 


Observe that it is optimal to exercise the option at this stage. 

Notice that the prices of the American and European calls in Example 15.1 
and Example 15.3 are identical. This happens because there is nothing to gain by 
early exercising of the American call. By contrast, there is a substantial difference 
in the prices of the American and European puts in Example 15.2 and Example 


vı (20) = max f , (60 20+} = 40, 


vı (80) = max { (60 — soy*} =8. 


Vo(40) = max { , (60 soy* = 20. 


15.4. This happens because sometimes it is advantageous to exercise an American 
put early. The results of these examples illustrate the following far more general 
property of American options. 


Theorem 15.5 Consider the binomial lattice model described in Section 15.1, 
and an American option contract on the underlying risky asset that can be 
exercised at any time k = 0,1,...,N with payoff g(Sk) for some function g(-). 
Ifr > 0 and the function g(-) is convex and satisfies g(0) = 0, then the value of 
the American option contract is the same as that of a European option contract 
with payoff g(Sn) that can only be exercised at stage N. In other words, early 
exercising of the American option yields no advantage. 


Proof By (15.2), it suffices to show that the following equation and inequality 
hold for k = 0,1,...,N — 1: 


i ie p 
Vk(Sk) = EF (PVk+1(USk) + Vr+1(dSk)) = g(Sk). (15.3) 
First, observe that for k = 0,1,...,N — 1 
ae E 


since p,q are the risk-neutral probabilities. 
Next, we prove (15.3) by (backward) induction on k. The assumptions on the 
function g(-), equation (15.4), and Vy (Si) = g(Sw) imply that 


r-O0+puSn—1+ qdSn_1 
g(Sn-1) =9 Er 
r 1 
Koi ~ ~ 
< pp 90) + gyp GauSw-1) + Gg(ASw-1)) 


1 yy ~ 
= Tap PV (uS-1) + @Vy(dSy_1)). 
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Therefore (15.3) holds for k = N — 1. Suppose (15.3) holds for k = 7 +1< N-1. 
The assumptions on g(-), equation (15.4), and the induction hypothesis imply 
that 


r- 0+ pus; + qdS; 
s8) =o 1 L) 


l+r 
r 1 
< . ' bg(wS;) + ĝgldS; 
< ggr 0) + gyros) + G9(d55)) 
< [ay (BVi+1(US5) + Vj-41(455)). 


Hence (15.3) holds for k = j as well. 


Option Pricing in Continuous Time 


The binomial lattice model can be seen as a discrete version of a popular 
continuous-time geometric Brownian motion model. We next sketch some of the 
main ideas and results of this continuous model and its relation to the binomial 
lattice model discussed above. A full treatment of this topic is beyond the scope 
of this book. We refer the reader to Shreve (2000) for a detailed exposition of 
this topic. 

Suppose the continuous-time price S;, with t € [0,7], of a risky asset evolves 
according to the stochastic differential equation 


d 
= = pdt + odW,, (15.5) 


t 
where u and o are constants representing the instantaneous drift and volatility 
of the asset price S;, and W; is a Brownian motion. The stochastic differential 
equation (15.5) can be seen as a continuous-time analog of the one-period up or 
down price movement in the binomial lattice model. The solution to (15.5) is the 
continuous-time process 


Si = Spee" /2)ttoWe (15.6) 


which can equivalently be written as 
2 


log Š = (u- T) tow. 
Techniques from stochastic calculus have led to the development of pricing 
models for a wide variety of options provided the underlying risky asset is 
modeled via a suitable stochastic differential equation. In particular, in their 
seminal and ground-breaking work Black and Scholes (1973) and Merton (1973) 
derived a pricing formula for a European option on an underlying risky asset with 
a price process modeled as a geometric Brownian motion. In particular, consider 
a call option maturing at time T > 0 with payoff (Sr — K)*. Assume the price 
of the underlying risky asset is as in (15.6) and the risk-free asset compounds 
continuously at an instantaneous rate r > 0; that is, the price B; of the risk-free 
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asset is 
B; 4 Boet. 


The Black-Scholes-Merton model yields the following explicit formula for the 
price V; (S+) at time t € [0, T] of a European call option with payoff (Sr — K)t: 


Vi(St) = B(d1)S; — B(d2) Ke T? (15.7) 


where 


a saf Pomas ea e (T -2) 
a ea nh > epee ls ENKI AN T 
dg = di —oVT — t. 


The Black-Scholes-Merton model also yields the following formula for the 
price V; (S4) at time t € [0, T] of a European put option with payoff (K — Sr)": 


V,(S;) = (d2) Ke7" T -8 — &(—d,)S;, 


where ®(-), d1, dz are the same as above. 

The binomial lattice model can be seen as a discrete approximation of the 
geometric Brownian motion. The following section and Exercise 15.4 at the end 
of the chapter elaborate on this approximation. 


Specifying the Model Parameters 


To specify the binomial lattice model, one needs to choose values for u, d, and 
p. This is done by matching the mean and volatility of the asset price to the 
mean and volatility of the above binomial distribution. Because the model is 
multiplicative (the price S of the asset being either u - S or d- S in the next 
stage), it is convenient to work with log(S,41/Sx). 

Let Sp denote the asset price in stages k = 0,...,N. Let u and ø be the 
mean and volatility of In(S'v/So). (We assume that this information about the 
asset is known.) Let A = 1/N denote the length between consecutive stages. 
Then for k = 0,1,..., N — 1 the mean and volatility of In(.$;41/5;,) are uA and 
av A respectively. In the binomial lattice, a direct computation shows that for 
k =0,1,..., N—1 the mean and variance of In(.$;,41/5;) are plnu + (1—p) Ind 
and p(1—p)(In u—ln d)? respectively. Matching these values we get two equations: 


plnu+(1—p)Ind=pA 
pa — p)(Inu— Ind)? =07A. 


Note that there are three parameters but only two equations, so we can set 
d = 1/u as in Cox et al. (1979). Then the equations simplify to 


(2p — 1) lnu = pA 
4p(1 — p)(Inu)? = o°A. 
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Squaring the first and adding it to the second, we get (Inu)? = o7A + (wA)?. 
This yields 

= eV PATUA? 

d= e- V? ATA? 


1 1 
p= 1+ 
2 ( ee 


When A is small, these values can be approximated as 


In other words, for small A 


1 Be 
oVA _ with probability 5 (1 + EVA) 


w 
a 


Sk —oVA with probability ; (1 L EVA), 
o 


log Skt 


which is a discrete approximation of (15.5). 

As an example, consider a binomial model with 52 periods of one week each. 
Consider also a stock with current known price Sp and random price $52 a year 
from today. We are given the mean p and volatility o of In(S52/So), say u = 10% 
and o = 30%. What are the parameters u, d, and p of the binomial lattice? Since 
A= 5 is small, we can use the second set of formulas: 


0-30/ V52 = 1,0425 


urxe 
d œ e7030/752 — 9.9592 


1 0.10 
E E 9593, 
Pag ( awe 


? 


Exercises 


Exercise 15.1 Apply Theorem 15.5 to show that the price of a European call 
option and an American call option with the same strike price and expiration 
date are the same in the binomial lattice model. Why does Theorem 15.5 not 
apply for put options? 


Exercise 15.2 Compute the value of an American call option on a stock with 
current price equal to $100, strike price equal to $102, and expiration date four 
weeks from today. The yearly volatility of the logarithm of the stock return is 
o = 0.30. The risk-free interest rate is 4%. Use a binomial lattice with N = 4. 
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Exercise 15.3 Compute the value of an American put option on a stock with 
current price equal to $100, strike price equal to $98, and expiration date five 
weeks from today. The yearly volatility of the logarithm of the stock return is 
o = 0.30. The risk-free interest rate is 4%. Use a binomial lattice with N = 4. 


Exercise 15.4 This is a computational exercise. Repeat Exercises 15.2 and 
15.3 using a binomial lattice with N = 10, N = 100, and N = 1000. Compare 
the results obtained for the call option with those given by the Black—Scholes— 
Merton formula (15.7). 
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16.1 


Multi-Stage Stochastic 
Programming 


Stochastic programming is a computational approach to stochastic optimization. 
Stochastic programs have been studied for several decades (see, e.g., Birge and 
Louveaux, 1997; Shapiro et al., 2009). Typically, the approach hinges on a 
reformulation of the stochastic optimization problem as a deterministic one via 
a scenario tree. Computational and algorithmic advances have made stochastic 
programming techniques applicable to various classes of real-world problems. 


Multi-Stage Stochastic Programming 


Multi-stage stochastic optimization can be seen as a generalization of the generic 
class of stochastic optimization model discussed in Chapter 10. Let 0,1,...,7 
index a set of stages where decisions are to be made. Assume that between two 
consecutive stages t — 1 and t some random outcome w+ is revealed. At each 
stage t = 0, 1,...,T we make a set of non-anticipatory decisions x; that can only 
depend on the random information revealed up until that stage. Schematically, 
the process can be seen as follows: 


decision random decision random random decision 
Xo draw w1 X] draw wə draw wr XT 


A multi-stage stochastic minimization problem is the following kind of multi-fold 
version of the two-stage stochastic model (10.2) discussed in Chapter 10: 


min go(xo) + E[Q1 (x0, w1)] 


(16.1) 
Xo € Xo, 


where the recourse term Qı(xo,w1) similarly depends on the decisions to be 
made at later stages: 


Q1 (X0, w1) = miin g (x1, 01) + E[Q2(x1,w2)] 
xı € Xi (x0, w1), 
with 
Qe(Xe-1, We) = min g(x, We) + ElQi (Xe, we+1)] 


Xt © AX; (Xt-1, we), 
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for t = 2,...,% — 1, and the last-stage recourse term Qr(xr_1,wr) is of the 
form 


Qr(X7-1, Wr) := min gr (XT, wr) 
T. 
xr E€ Xr(xr-1;,wr). 


The multi-stage optimization problem (16.1) can also be written as 


min Xo)+E min X1,WwW1) +: +E min XT, W , 
min gol 0) E Lea : 1) eee ae T n|! 


Consider the special case of linear multi-stage stochastic optimization, where the 
components are linear. More precisely, each gi(X+, w+) = ce] x: for some vector c; 
and each inter-temporal constraint x; E€ V;(x;_1,w;) is of the form 


Bixi-1 + Aix = bi, x > 0, 


where w; = (cz, Az, Bz, by) is only revealed at stage t. In this case we often write 


a s [coxo + cix + Si + chxr] 
s.t. Aoxo — 
Bixo + Axı = b 
Box; + A2X2 = b (16.2) 
Brxr-ı + Arxr = br 
Xo, X1, sine XT > 0. 


Example 16.1 (Financial planning example) Assume an investor has initial 
wealth Wo at t = 0. At stage t she can invest in two asset classes: bonds and 
stocks. The (random) gross return on bonds from time t — 1 to t is Rp~ and the 
(random) gross return on stocks from t— 1 to t is R,. Assume that the investor 
needs to meet liabilities L; at times t = 1,...,T. She wants to maximize her 
expected wealth at time T (after covering the liabilities). Assume no shorting is 
allowed. 


Formulation of the financial planning example 


Variables: 
xı: amount of money invested in bonds at stage t, fort = 0,..., T — 1; 
yz: amount of money invested in stocks at stage t, fort = 0,...,T— 1; 
Wr: wealth at time T 


max E(Wr) 
s.t. 0 + yo = Wo 
Rott1-1t+ Rs tYt-1 = Lt + ce ty, (=1,...,T—-1 
Ri rer-1ı + Rs ryr-ı = Lr + Wr 
Tt, Yt > 0, #=0,1,...,7-1 
Wr > 0. 
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Notice that in this model the parameters Rp, and Rs are unknown 
prior to time t. 


Scenario Optimization 


As discussed in Chapter 10 for the two-stage case, a multi-stage optimization 
model can be recast as a deterministic equivalent if each of the random out- 
comes has a discrete distribution. In this case for each stage t = 1,2,...,T 
there is a finite set of possible values or realizations {w},...,w?} for the random 
outcome w+. These sets of realizations can be described by an event tree as 
depicted in Figure 16.1 for a problem with three stages. In this particular tree, 
the random variables wı and w have two- and five-valued discrete distributions 
respectively. The tree structure is associated with the discrete filtration gener- 
ated by the discrete-time random process wry := (w1,W2,...,W4), t= 1,...,7. 
In particular, each possible value of wə has a unique predecessor value of w1. We 
further elaborate on this tree structure below. 


0 1 2 Stage 
Figure 16.1 Event tree for a three-stage model 


The set of scenarios described by the event tree in turn yield a deterministic 
equivalent of a multi-stage stochastic optimization model. We next illustrate 
this equivalence for the stochastic optimization model described in Example 16.1 
in a particularly simple event tree. We subsequently describe the deterministic 
equivalent for linear multi-stage stochastic programs in more general event trees. 


Example 16.2 (Financial planning revisited) Consider the model described in 
Example 16.1. Suppose T = 2 and there are two equally likely outcomes (“H” 
and “T”) for the joint returns (Rp +, Rs) over each period, say 


(Rol H), Rs e(H)) = (1.14,1.25) and (R(T), Rs +(T)) = (1.1, 1.06). 


Figure 16.2 illustrates the corresponding scenario tree. In this event tree the 
labels “H” and “T” on the edges indicate the specific outcome between two 
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consecutive stages. Observe that each of the four scenarios HH, HT,TH,TT in 
the event tree occurs with probability 1/4. 


0 1 2 Stage 
Figure 16.2 Event tree for financial planning model 


The scenario tree in turn yields a deterministic equivalent formulation for the 
financial planning stochastic optimization model. In the deterministic equivalent 
the stage 0 decisions are made at the root of the tree and thus may not depend 
on any of the random outcomes. The stage 1 decisions may depend on the initial 
path H or T realized up to stage 1 in the event tree. Finally, the stage 2 decisions 
may depend on the path HH, HT, TH, or TT realized up to stage 2 in the event 
tree. The corresponding adaptiveness of the variables and constraints is explicitly 
reflected in the following deterministic equivalent formulation. 


Scenario optimization model 

Variables: 
Xo, Yo: money in bonds and stocks at t = 0; 
xı(H), xı(T),yı(H), yi(T): money in bonds and stocks at t = 1; 
W2(HH), W2(HT), Wo(TH), W2(TT): wealth at t = 2 


1 

max 4 + (W2(H H) + W2(HT) + W2(TH) + W2(TT)) 

s.t. to + Yo = Wo (stage 0) 
1.1429 + 1.25y9 = Lı + xı(H) + yı (A) (stage 1, path H) 
1.1zo + 1.06yọ = Lı + 21(T) + yi (T) (stage 1, path T) 
1.140) (H) + 1.25y1(H) = Lə + W2(H#A) (stage 2, HH) 
1.1zı(H) + 1.06y1(H) = Lə + W2(HT) (stage 2, HT) 
1.1401 (T) + 1.25y1 (T) = Lə + W2(TH) (stage 2, TH) 
1.lai(T) + 1.06y1ı (T) = Lo + Wo(TT) (stage 2, TT) 


zo, Yo, t1(H), 21(T), 1 (7), yi1(T) = 0 
W2(HH), Wo(HT), Wo(TH), W2(TT) > 0. 
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The above scenario optimization approach is quite flexible. In particular, 
consider a variation of the above financial planning model where the objective 
is max E (U(Wr)) for some concave utility function U(W). The corresponding 
deterministic equivalent has exactly the same variables and constraints as the 
one above and the following objective: 


max (U(Wo(HH)) + U(Wo(TH)) + U(Wo(HT)) + U(W2(TT))). 


Furthermore, if U(-) is piecewise linear, then the problem can be recast as a 
linear program. (See Exercise 16.1.) 

Consider now the general multi-stage linear stochastic program (16.2). Suppose 
each random vector w: = (c+, Az, Bz, b;) has a discrete distribution and consider 
their event tree representation. The description of the deterministic equivalent 
relies on the following notation. Let 


Q = fuk = (ck, A}, BE bE) ik= 1,...,51} 


be the set of possible realizations of the random variable w, for some integer S > 
1 and for each stage t = 1,..., T. Let pë = P(w, = wt), with k=1,...,5;, t = 
1,..., T. The set Q; = {wł,... wS} corresponds to the nodes in layer t of the 
event tree, which can be conveniently denoted (t, 1),...,(t, S+) as illustrated in 
Figure 16.3. 


mo 
e w 


N 
7 


0 1 2 Stage 
Figure 16.3 Event tree for a three-stage model with node labels 


Observe that in the event tree there is always a single root node (0, 1) in layer 0. 
Furthermore, for each t = 1,...,T — 1 each node (t, k) has a unique predecessor 
(t—1, k) in the immediately preceding layer of the tree. For instance, in the event 
tree depicted in Figure 16.3 the predecessors of each of the nodes in layer 2 is a 
unique node in layer 1 as follows 


i=2=1,3=4=5=2. 


Observe that the probability of a non-terminal node equals the combined prob- 
ability of its direct descendants; that is, for t = 1,...,7 and for every node 
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(t — 1, £) we have 
Pi = SS př. 
(t,k):k=2 


Consider the multi-stage linear stochastic program (16.2) and assume the 
random outcomes are described via a suitable event tree. We next detail the 
variables, objective, and constraints of the corresponding deterministic linear 
program equivalent. 


Variables: Stage 0 variables: xo. 
Stage t variables can be adapted to the S; possible paths up to stage 
t; that is, 


x*, k=1,...,S;. 


Objective: The deterministic equivalent of the objective function 


T 
min a [eg xo + c] xı + + cpxr] = min E etx + > cls] 
t=1 
is 
T St 
, T ki a ta EAN 
min co Xo + ) J pi (c) xi. 


t=1 k=1 
Constraints: The deterministic equivalent of each inter-temporal constraint 
B:x:-1 + Arx: = by, x: >O, 
is the set of constraints 
Bex’ | + A*xt = b, x% >0, for k=1,..., 5. 


Observe that these constraints link the stage t variables x} associated 
with the layer t nodes (t,&) with the variable associated with their 
predecessor (t — 1, k). 


Thus the complete deterministic equivalent of (16.2) is as follows: 


Sı ST 
min efxo + So phleh)*xh + + SOR Txt (16.3) 
k=1 k=1 
s.t. Aoxo — bo 
Bixo + Akxt =b*, k=1,..., S1 
Bkx* + Akxk = bo, k= 1,..., S2 
BExk | + Akxk =bk, k=1,..., Sr 


k k 
Xo, xÍ, sE xy > 0. 
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For example, if T = 2 and the event tree is as depicted in Figure 16.3, then 
the deterministic equivalent is 


: T teats Tat 22T 2 1/a1\T 1 2 2\T_2 33T 3 
min coxo + pi(ci) xı + pi(er) xí + pə(c2) x2 + p3(c3) x2 + p3(c2) x3 
4AT 4 5 (n 5\T 5 
+ p2(c2) x2 + p2(c2) x2 
s.t. Agxo = bo 
Bixo + Aixi =bji 
Bix0 + Aix} =bj 
Bixi + Abx} = bå 
201 29 2 
Box 3.2 E Axe 33 = 
Bex a a A5X5 La = be 
Boxi = A5X5 : = bs 
5 58,0 __ 5 
2X1 T 2X2 = D2 
1 2 1 2 3 4 5 
Xo, Xi, Xi, X2, X2, X2, X2, x2 20 


Observe that the constraint matrix in the above model has the following 


structure: 
Ao 
Bi Aj 
BY Aj 
B2 A} 
B3 A3 
B A3 
B AS 
B5 AD 


The constraint matrix for the general deterministic equivalent (16.3) has a similar 
type of structure. 

There are alternative ways of labeling the nodes and branches in the event 
tree. In particular, the simple binary branching in Example 16.2 readily suggests 
the natural edge labeling depicted in Figure 16.2. In that case the nodes in layer 
t can alternatively be labeled via the t-long sequence of labels along the path 
from the root node. This kind of edge labeling may appear more intuitive but 
it could become cumbersome in other cases when there are different numbers of 
branches at each non-terminal node. 

Note that the size of the deterministic equivalent of a multi-stage stochastic 
program increases rapidly with the number of stages. For example, for a problem 
with 11 stages and a binary event tree, there are 21? = 1024 scenarios and 
therefore the linear program (16.3) may have several thousand constraints and 
variables, depending on the number of variables and constraints at each node. 
Modern commercial codes can handle such large linear programs, but a moderate 
increase in the number of stages or in the number of branches at each stage could 
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make (16.3) too large to solve by standard linear programming solvers. When 
this happens, it is critical to exploit the special structure of (16.3) to solve the 
model efficiently. 

The Benders decomposition (or L-shaped method) introduced in Section 10.5 
can also be used for multi-stage problems (16.3) in a straightforward way: The 
stages are partitioned into a first set that gives rise to the “master problem” 
and a second set that gives rise to the “recourse problems”. For example in a 
six-stage problem, the variables of the first two stages could define the master 
problem. When these variables are fixed, (16.3) decomposes into separate linear 
programs each involving variables of the last four stages. The solutions of these 
recourse linear programs provide optimality or feasibility cuts that can be added 
to the master problem. As discussed in Section 10.5, upper and lower bounds 
are computed at each iteration and the algorithm stops when the difference 
drops below a given tolerance. Using this approach, Gondzio and Kouwenberg 
(2001) were able to solve an asset liability management problem with over four 
million scenarios, whose linear programming formulation (16.3) had 12 million 
constraints and 24 million variables. This linear program was so large that storage 
space on the computer became an issue. The scenario tree had six levels and 
13 branches at each node. In order to apply Benders’ decomposition, Gondzio and 
Kouwenberg divided the six-period problem into a first-stage problem containing 
the first three periods and a second-stage problem containing periods four to six. 
This resulted in 2197 recourse linear programs, each involving 2197 scenarios. 
These recourse linear programs were solved by an interior-point algorithm. Note 
that Benders’ decomposition is ideally suited for parallel computations since the 
recourse linear programs can be solved simultaneously. When the solution of all 
the recourse linear programs is completed (which takes the bulk of the time), the 
master problem is then solved on one processor while the other processors remain 
idle temporarily. Gondzio and Kouwenberg tested a parallel implementation on 
a computer with 16 processors and they obtained an almost perfect speedup, 
that is a speedup factor of almost k when using k processors. 


Scenario Generation 


A key aspect of multi-stage stochastic programming is the generation of scenarios 
so that the deterministic equivalent formulation (16.3) accurately represents the 
underlying stochastic optimization problem. 

There are two separate issues. First, one needs to model the correlation over 
time among the random parameters. For a pension fund, such a model might 
relate wage inflation (a random parameter that influences the liability side) to 
interest rates and stock prices (random parameters that influence the asset side). 
Below we discuss a simple autoregressive model that can be used for this purpose. 
A second issue is the construction of a scenario tree from these inter-temporal 
statistical models: A finite number of scenarios must reflect as accurately as 
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possible the random processes modeled in the previous step, suggesting the need 
for a large number of scenarios. On the other hand, the linear program (16.3) can 
only be solved if the size of the scenario tree is reasonable, suggesting a limited 
number of scenarios. To reconcile these two conflicting objectives, it might be 
crucial to use variance reduction techniques. We address these issues in this 
section. 


Autoregressive Model 


In order to generate the random parameters underlying the stochastic program, 
one needs to construct an economic model reflecting the correlation between the 
parameters. Historical data may be available. The goal is to generate meaningful 
time series for constructing the scenarios. One approach is to use an autoregres- 
sive model. 

Specifically, if r denotes the random vector of parameters in period t, an 
autoregressive model is defined by 


r; = Do + Diri +--+ +Dpri-p + &, 


where p is the number of lags used in the regression, Do, D1,..., Dp are time- 
independent constant matrices, which are estimated through statistical methods 
such as maximum likelihood, and e; is a vector of i.i.d. random disturbances 
with mean zero. 

To illustrate this, consider a problem where the vector r; consists of three 
random parameters: s;,6;, and m+ are the rates of return of stocks, bonds, and 
the money market, respectively, in year t. An autoregressive model with p = 1 
has the form: 


St dy di, di2 di3] | St-1 ef 
be | = |do| + |da doo do3} |b| +] ee], t=2,...,T. 
me d3 d31 d32 d33] Lmt-1 A 


Assuming independent error terms ef, œ, and ¢7”, and using historical data, 
one can find the parameters dı, d11, d12, dı3 in the first equation, 


sı = dy + di1St—1 + di2bt—1 + di3Mi—1 + €f, 


using standard linear regression tools that minimize the sum of squared errors 
e7. Useful statistics, such as the standard error o, of the estimates s+, can also 
be obtained. Similarly for b; and m4. 


Constructing Scenario Trees 


The random distributions relating the various parameters of a stochastic pro- 
gram must be discretized to generate a set of scenarios that is adequate for its 
deterministic equivalent. Too few scenarios may lead to approximation errors. 
On the other hand, too many scenarios will lead to an explosion in the size of 


16.3 Scenario Generation 257 


the scenario tree, leading to an excessive computational burden. In this section, 
we discuss a simple random sampling approach and two variance reduction tech- 
niques: adjusted random sampling and tree fitting. Unfortunately, scenario trees 
constructed by these methods could contain spurious arbitrage opportunities. 
We end this section with a procedure to test that this does not occur. 


Random Sampling 
One can generate scenarios directly from the autoregressive model introduced in 
the previous section: 


re = Do + Direi +--+ + Dpri_p + &, 


where e ~ N(0,%) are independently distributed multivariate normal distribu- 
tions with mean 0 and covariance matrix ©. 

In our example with three random parameters s+, b+, and m4, and independent 
error terms ef, e, e”, the matrix © is a 3 x 3 diagonal matrix, with diagonal 
entries Os, Ob, Cm. Thirty branches or so may be needed to get a reasonable 
approximation of the distribution of the rates of return in stage 1. For a problem 
with three stages, 30 branches at each stage represent 27,000 scenarios. With 
more stages, the size of the linear program (16.3) explodes. Kouwenberg (2001) 
performed tests on scenario trees with fewer branches at each node (such as a 
five-stage problem with branching structure 10-6-6—4—4, meaning ten branches 
at the root, then six branches at each node in the next stage and so on) and 
he concluded that random sampling on such trees leads to unstable investment 
strategies. This occurs because the approximation error made by representing 
parameter distributions by random samples can be significant in a small sce- 
nario tree. As a result the optimal solution of (16.3) is not optimal for the 
actual parameter distributions. How can one construct a scenario tree that more 
accurately represents these distributions, without blowing up the size of the 
linear program (16.3)? 


Adjusted Random Sampling 

An easy way of improving upon random sampling is as follows. Assume that each 
node of the scenario tree has an even number K = 2k of branches. Instead of 
generating 2k random samples from the autoregressive model, generate k random 
samples only and use the negative of their error terms to compute the values on 
the remaining & branches. This will fit all the odd moments of the distributions 
correctly. In order to fit the variance of the distributions as well, one can scale 
the sampled values. The sampled values are all scaled by a multiplicative factor 
until their variance fits that of the corresponding parameter. 


Tree Fitting 

How can one best approximate a continuous distribution by a discrete distribu- 
tion with K values? In other words, how should one choose values vz; and their 
probabilities pg, for k = 1,..., K, in order to approximate the given distribution 
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as accurately as possible? A natural answer is to match as many of the moments 
as possible. In the context of a scenario tree, the problem is somewhat more 
complicated since there are several correlated parameters at each node and there 
is interdependence between periods as well. Hoyland and Wallace (2001) propose 
to formulate this fitting problem as a nonlinear program. The fitting problem 
can be solved either at each node separately or on the overall tree. We explain 
the fitting problem at a node. Let S, be the values of the statistical properties 
of the distributions that one desires to fit, for l = 1,...,s. These might be 
the expected values of the distributions, the correlation matrix, the skewness, 
and kurtosis. Let v and pz, denote the vector of values on branch k and its 
probability, respectively, for k = 1,...,K. Let fi(v,p) be the mathematical 
expression of property l for the discrete distribution (for example, the mean 
of the vectors vg, their correlation, skewness, and kurtosis). Each property has 
a positive weight w, indicating its importance in the desired fit. Hoyland and 
Wallace formulate the fitting problem as 


min X wil fi(v, Pp) — $1)? 
1 
s.t. ye =1 aea 
k 


p > 0. 


One might want some statistical properties to match exactly. As an example, 
consider again the autoregressive model: 


ry = Do + Diry-1 +--+ + Dpri_p + ét, 


where es ~ N(0, X) are independently distributed multivariate normal distribu- 
tions with mean 0 and covariance matrix ©. To simplify notation, let us write € 
instead of e,. The random vector e has distribution N(0, ©) and we would like 
to approximate this continuous distribution by a finite number of disturbance 
k occuring with probability pp, for k = 1,..., K. Let ek denote the qth 
component of vector e*. One might want to fit the mean of e exactly and its 


vectors € 


covariance matrix as well as possible. In this case, the fitting problem is 


Lol K 2 
min 5 5 (>: prehet = ze) 


el,...,6K,p 


q=lr=1 \k=1 
K 
s.t. XO pre" =0 
k=1 
ieee 
k 
p> 0. 


Arbitrage-Free Scenario Trees 
Approximating the continuous distributions of the uncertain parameters by a 
finite number of scenarios in the linear program (16.3) typically creates modeling 
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errors. In fact, if the scenarios are not chosen properly or if their number is 
too small, the supposed “linear programming equivalent” could be far from being 
equivalent to the original stochastic optimization problem. One of the most 
disturbing aspects of this phenomenon is the possibility of creating arbitrage 
opportunities when constructing the scenario tree. When this occurs, model 
(16.3) is flawed as it would be distorted by the arbitrage opportunities. Klaassen 
(2002) was the first to address this issue. In particular, he shows how arbitrage 
opportunities can be detected ex post in a scenario tree. See Exercise 16.3 for 
details. When arbitrage opportunities exist, a simple solution is to discard the 
scenario tree and to construct a new one with more branches. Klaassen also 
discusses what constraints to add to the nonlinear program (16.4) in order to 
preclude arbitrage opportunities ex ante. The additional constraints are nonlin- 
ear, thus increasing the difficulty of solving (16.4). 


Exercises 


Exercise 16.1 Consider the following variation of the “financial planning” 
problem discussed in Section 16.1: 


e Assume there are four stages, 0 through 3 (i.e., T = 3). 

e Over each period we have two equally likely outcomes for joint returns of 
bonds and stocks: (14%, 25%) and (10%, 6%). 

e Initial wealth Wo = 55. Ly = Lə = 0. Final liability L3 = 70. 


(a) Use a multi-stage scenario optimization approach to determine the sequence 
of investment decisions so that the liability is met at T = 3, and the expected 
value of the remaining wealth is maximized. Your investment decisions at 
each stage must be non-anticipatory. That is, they can only depend on the 
scenario path up to that stage. 

(b) Modify your model to solve the following variation: Instead of the single 
liability at stage 3, the following sequence of liabilities must be met at stages 
1, 2, 3: 

Lı = 20, Lə = 20, L3 = 25. 


(c) Assume this time that Lı = Lə = 0 and that the final liability is L3 = 90. 
Since this is too high, it is clear that, regardless of the investment decisions, 
the final wealth W3 will be negative in some scenarios. Modify your model to 
maximize instead E(U(W3)), where the utility function U(W) is as follows: 


WwW if W>0 
uw) ={ 3W if W <0. 


It is preferable for your model to be a linear program for computational 
purposes. For that purpose, you need to recast the objective 


max E(U(W7)) 
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so that the resulting model is a linear program. To do so, observe that 
U(W) = min{W, 3W} and use a suitable set of new variables. 


Exercise 16.2 Consider the following dynamic portfolio problem. 


e At time t = 0 you have an initial endowment Wo. 

e At time t = 0,1,...,T — 1 you invest a fraction x, of your wealth in a risky 
asset and the remaining fraction 1— zxz in a risk-free asset. The risk-free and 
risky asset returns between t and t + 1 are rfz41 and fs +1 respectively. 

e At time t= 1,...,T you receive an exogenous and deterministic income of I4. 


Let W, denote your wealth at time t = 0,1,...,7. Your goal is to maximize 
utility of final wealth E(U(W7)). 


(a) Write the law of motion for W;. 

(b) Assume that between t and t+ 1 there are two equally likely outcomes H 
and T. The risky return rs +1 in each of these scenarios is as follows: 
e In outcome H: rs ¢41 = 0.5. 
e In outcome T: rs 444 = —0.4. 
Assume a zero risk-free return, ry 441 = 0, in both scenarios. 

Suppose T = 2. Write down the “law of motion” for Wı and W3 in each 

relevant scenario using the above numerical values for rf 441, 7s,t41- 

(c) Assume Wọ = 1 and J, = Ig = 0.1. Use scenario optimization and Excel 
Solver or MATLAB to solve the two-period portfolio optimization problem 


Ws 7 
L=4 


max 
£0 ,L1 


for y = 0.4, 0.7, 0.9. 

(d) Repeat part (c) but this time assume Wọ = 1 and J, = Ip = 0. 

(e) Repeat part (c) but this time assume Wọ = 1 and J, = Ip = 0.2. 

(£) Can you infer anything from the numerical results in (c), (d), and (e) about 
long-term investment when you know you will receive income along the 
investment horizon? 


Exercise 16.3 Recall from Section 4.2 in Chapter 4 that an arbitrage oppor- 
tunity is an opportunity to make money without any cost and without any risk. 
Consider a particular node at some stage t — 1 > 0 in a scenario tree whose set 
of immediate descendants in stage t is K. For each k € K let r* € R” denote the 
vector of asset returns of a set of n assets realized in branch k between stages 
t—1 and t. 


(a) Show that an arbitrage opportunity exists if there is an asset allocation 


x= [x1 e Tal such that 
n n 
beer <0, Sat) hx > 0, 
j=l j=l 


where at least one inequality is strict. 
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(b) Show that the condition in part (a) holds if and only if the following condition 
does not hold: there exist yx > 0, for k € K, such that 
5 ype” =1. 
keK 
Hint: Apply the same kind of reasoning used in Section 4.2. 
(c) Use part (a) and part (b) above to modify the nonlinear program (16.4) in 
order to formulate a fitting problem at a node that does not contain any 
arbitrage opportunities. 
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Stochastic Programming Models: 
Asset—Liability Management 


Asset-Liability Management 


The financial health of any company, and in particular of financial institutions, 
is reflected in the balance sheet of the company. Proper management of the 
company requires attention to both sides of the balance sheet — assets and 
liabilities. Asset—liability management offers sophisticated mathematical tools 
for an integrated management of assets and liabilities. 

Asset—liability management recognizes that static, one-period investment plan- 
ning models (such as mean-variance optimization) fail to incorporate the multi- 
period nature of the liabilities faced by the company. A multi-period model that 
emphasizes the need to meet liabilities in each period for a finite (or possibly 
infinite) horizon is often required. Since liabilities and asset returns usually have 
random components, their optimal management requires techniques to optimize 
under uncertainty. In particular, stochastic programming approaches have been 
effective for these kinds of problems. 

The main components of the asset—liability management problem are the 
stream of (random) liabilities faced by the firm, spread out over time, and the 
(random) returns of the assets that the firm may use for investments. Positions 
can be adjusted at each intermediate stage, adapting to the information revealed 
up to that stage. This is closely related to the financial planning example pre- 
sented in Example 16.1. 

The model assumes a planning horizon of T periods. Let R; denote the gross 
return of asset i between time t— 1 and t, for i = 1,...,n and t = 1,...,T. Let 
L denote the liability at time t = 1,...,T. Suppose we want to maximize the 
expected wealth of the firm at time T. 


Multi-stage stochastic programming formulation 
Variables: 
£i +: amount invested in asset 7 at time t, for i =1,...,n and t = 0,1,..., T- 1. 
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Objective: 


max E 


5 Riri- by 


t= 
n n 

st. So Ristin =Li+ > siz, fort=1,...,7-1 
tri gL 


ra O a e E a a O 


The equality constraint in this formulation states that the surplus left after 
liability L is covered will be invested in the amounts z; ¢ in asset i for i = 
1 


es Te 

The objective selected in the model above is to maximize the expected wealth 
at the end of the planning horizon. In practice, one might have a different objec- 
tive. For example, in some cases, minimizing value at risk (VaR) or conditional 
value at risk (CVaR) might be more appropriate. Other priorities may dictate 
other objective functions. 

To address the issue of the most appropriate objective function, one must 
understand the role of liabilities. Pension funds and insurance companies are 
among the most typical arenas for the integrated management of assets and 
liabilities. 


The Case of an Insurance Company 


We consider the case of a Japanese insurance company, the Yasuda Fire and 
Marine Insurance Co. Ltd., following the work of Cariño et al. (1994). In this case, 
the liabilities are mainly savings-oriented policies issued by the company. Each 
new policy sold represents a deposit, or inflow of funds. Interest is periodically 
credited to the policy until maturity, typically three to five years, at which time 
the principal amount plus credited interest is refunded to the policyholder. The 
crediting rate is typically adjusted each year in relation to a market index like 
the prime rate. Therefore, we cannot say with certainty what the future liabilities 
will be. Insurance business regulations stipulate that interest credited to some 
policies be earned from investment income, not capital gains. So, in addition 
to ensuring that the maturity cash flows are met, the firm must seek to avoid 
interim shortfalls in income earned versus interest credited. In fact, it is the risk 
of not earning adequate income quarter by quarter that the decision makers view 
as the primary component of risk at Yasuda. 

The problem is to determine the optimal allocation of the deposited funds into 
several asset categories: cash, fixed-rate and floating-rate loans, bonds, equities, 
real estate, and other assets. Since we can revise the portfolio allocations over 
time, the decision we make is not just among allocations today but among 
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allocation strategies over time. A realistic dynamic asset—liability model must 
also account for the payment of taxes. This is made possible by distinguishing 
between interest income and price return. 

A stochastic linear program is used to model the problem. The linear program 
has uncertainty in many coefficients. This uncertainty is modeled through a finite 
number of scenarios. In this fashion, the problem is transformed into a very 
large-scale linear program of the form (16.3). The random elements include price 
return and interest income for each asset class, as well as policy crediting rates. 
We next describe the main components of the multi-stage stochastic program- 
ming model. 


Stages: The stages of the model are indexed by t = 0,1,...,T7. 


Variables: 
Zi į = market value in asset 7 at stage t for i =1,...,n and t = 0,1,...,T. 
w = interest income shortfall at stage fort =1,...,7. 
Uz = interest income surplus at stage fort = 1,..., T. 


Random parameters in the stochastic linear program: 


RP; = price return of asset i between stage t— 1 and stage t, for i = 1,...,n 
andt=1,...,T. 


RI; = interest income of asset i between stage t — 1 and stage t, for i = 
Tei wand. t= 1.8.7: 


F, = deposit inflow between stage t — 1 and stage t, fort = 1,...,T. 
P, = principal payout between stage t — 1 and stage t, fort =1,...,T. 
I, = interest payout between stage t — 1 and stage t, for t = 1,...,T. 


gt = rate at which interest is credited to policies between stage t— 1 and stage 
t fort =1,...,T7. 


L, = liability valuation at stage t. 


Parameterized objective function components: 


c(-) = piecewise linear convex penalty for shortfall at time t. 


The goal of the model is to allocate funds among available assets to maximize 
expected wealth at the end of the planning horizon T minus the expected penal- 
ized shortfall accumulated through the planning horizon. The problem can be 
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formulated as the following multi-stage stochastic program: 
T 


max E © iT — bD celw) 


t=1 


st. Soa YO (+ RP t Rha)tiv-1 = FP fort=1,...,T 
=! i=l asset accumulation 
Als tuz-—vu = gili- fort=1,...,T7 
t=} interest income shortfall 
Li =(1+g)Li1 +F; -P-L fort=1,...,T 
liability accumulation 
Zit > 0, w20, w0. (17.1) 


In the model discussed in Cariño et al. (1994), the stochastic linear program 
(17.1) is converted into a large linear program using a finite number of scenarios 
to deal with the random elements in the data. Creation of scenario inputs is 
made in stages using a tree. The tree structure can be described by the number 
of branches at each stage. For example, a 1—8—4—4—2-1 tree has 256 scenarios. 
Stage t = 0 is the initial stage. Stage t = 1 may be chosen to be the end of 
Quarter 1 and has eight different branches in this example. Stage t = 2 may be 
chosen to be the end of Year 1, with each of the previous eight branches giving rise 
to four new branches, and so on. For the Yasuda Fire and Marine Insurance Co. 
Ltd., a problem with seven asset classes and six stages gives rise to a stochastic 
linear program (17.1) with 12 constraints (other than non-negativity) and 54 
variables. Using 256 scenarios, this stochastic program is converted into a linear 
program with several thousand constraints and over 10,000 variables. Solving 
this model yielded extra income estimated to be about US $80 million per year 
for the company. 


Option Pricing via Stochastic Programming 


The option pricing problem discussed in Chapter 15 and modeled via the 
binomial lattice can alternatively be formulated as a stochastic programming 
problem. As should be expected, the two approaches are equivalent under the 
assumptions made for the binomial lattice model. However, there is additional 
flexibility in the stochastic programming approach that makes it applicable under 
less restrictive assumptions. In particular, we will discuss how the stochastic 
programming approach can easily model transaction costs. This is an important 
practical issue that cannot be incorporated in the binomial lattice model. 

We will work with the following similar setting to that in Chapter 15. Let S+, 
for t = 0, 1,..., N, denote the share price of a risky asset at times t = 0,1,..., N. 
Assume the economy also has a risk-free asset whose interest rate is r in each 
period [¢—1,t] fort = 1,..., N. 
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European Options 


Consider a European option contract that matures at time N with payoff g(Sv) 
for some function g(-) of the underlying asset price Sy. The following stochastic 
program provides a model for the lowest-cost portfolio of the underlying asset 
and the risk-free asset that can be constructed at time 0 and be subsequently 
rebalanced to super-replicate the payoff g(.S'y) of the European option contract. 


Variables: 

x, = amount of shares of the risky asset at time t fort = 0,..., N — 1. 

yt = amount of money in the risk-free asset at time t for t = 0,...,N — 1. 
Objective: 


min Sozo + Yo 
s.t. Snxvn—i + (1 + r)YN-1 = g(Sn) (17.2) 
Sete-1t(l+r)y-1 > Site t+ ye, t=1,...,N—-1. 


Consider the following binomial tree model for the risky prices. Assume that 
there are exactly two possible random outcomes (“H” and “T”) between time 
t—1 and t for t = 1,...,N. For the simplest case N = 1, the binomial tree 
model yields the following deterministic equivalent of (17.2): 

min Sozo + yo 
s.t. Sı(H)zo + (1+ r)yo > g(Sı(H)) (17.3) 
Si(T)xo + (1 +r)yo 2 g(Si(T)). 
Observe that the linear programming dual of (17.3) is 
max g(Sı(H)w(H) + 9(S1(T))o(T) 
(l+r)o(H)+ (14+ r)o(T)=1 
v(H), o(T) > 0, 


(17.4) 


which in turn can be rewritten via the change of variables p := (1+ 1r)v(H) and 
q := (14+1r)v(T) as 


max Tap O(S) + g(Si(T))q@) 

st. To (Si(H)P + 51(T)4) = So (17.5) 
p+q=l1 
p,q = 0. 


Without loss of generality assume Sı(H) > S(T). Furthermore, assume 
Si(H) > S(T) as otherwise the pricing problem of the option contract is 
trivial. It follows that (17.5) is feasible if and only if S1(T) < (1+r)So < S1(H). 
In this case the only feasible solution to (17.5) is 
(1 +r)So — Si(T) Si(H) — (1 +r)So 

Sı(H) — S(T) ’ Sı(H) — Sı(T) ’ 


q= 


p= 
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and thus the optimal value of (17.3) and (17.4) is 
1 
Dp H q T))). 
= (Pa(Si(H)) + G(S1(7))) 
Observe that this price is exactly the same (as it should be) as the one obtained 
via the one-period binomial model discussed in Section 4.3 when the single- 


period economy has no arbitrage. A similar duality argument shows that in the 
absence of arbitrage the stochastic programming model (17.2) is equivalent to 
the binomial lattice approach for the general multi-period case; that is, when 
N > 1. The following example illustrates this equivalence. 


Example 17.1 Suppose n = 2, r = 7, and the prices So, 51, S2 of the risky 
asset are as indicated at the nodes of the binomial tree depicted in Figure 17.1. 
Assume that the two branches emerging from each node are equally likely. 
Determine the price of a European put option maturing at time N = 2 with 
strike price 50; that is, with payoff g(S2) = (50 — S2)r. 

In this case the deterministic equivalent of (17.2) is 


min 4029 + yo 

s.t. 80% + 1.25y9 > 80x1(H) + yi (H ) 
20x + 1.25yo > 20x1 (T) + yi (T) 
160x1(H) + 1.25yı (H) > (50 — 160)* = 0 
4021 (H) + 1.25y,;(H) > (50 — 40)* = 10 
40a1(T) + 1.25y1 (T) > (50 — 40)* = 10 
1021(T) + 1.25y1(T) > (50 — 10)* = 40. 


The optimal solution to this linear program is 


xı(H) = —0.0833, y:(H) = 10.6666, zı(T) = —1, y:(T) = 40, (17.6) 
zo = —0.2666, yo = 20.2666 


and thus its optimal value is 9.6. 
On the other hand, the binomial lattice approach would yield the risk-neutral 

probabilities p = q = Z. Consequently, the value Vı (S1) of the option at time 1 

is 

1 0+10 = 1 10+40 


Vi(80) = 735 9  Vi20) = 755 A 
and the value Vo(So) of the option at time 0 is 
1 4+420 
Vo(4 = 9.6. 
Ue 1.25 2 oe 


The (super-)replicating portfolio (17.6) can also be recovered via delta-hedging. 


American Options 


Consider now an American option contract that can be exercised at any time t = 
0,1,...,.N with payoff g(S,) for some function g(-) of the underlying asset price 
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Figure 17.1 Binomial tree for option pricing example 


Sı. The stochastic program (17.2) has the following straightforward modification 
for finding a lowest-cost portfolio of the underlying asset and the risk-free asset 
that can be constructed at time 0 and be subsequently rebalanced to super- 
replicate the payoff of the American option contract: 


min Sozo + yo 
s.t. Sn@n-1t+(1+r)yn-1 > g(Sn) 
Sixt—1 + (14+ r)yz-1 > max{ Spa, + yz, 9(S¢)}, €=1,...,N—-1. 


The latter problem in turn can be equivalently stated as follows: 


min Sozo + yo 

s.t. Swan-1+(1+r)yn-1 > g(Sn) 
Sitz-1 + (1+ r)y-1 > Site ty, t=1,...,.N-1 
Sitz-1 + (14+ r)y-1 > g(S:), t=1,...,N—-1. 


(17.7) 


Again for a binomial event tree model the above stochastic programming 
approach is equivalent to the binomial lattice approach discussed in Chapter 15 
in the absence of arbitrage. 


Transaction Costs 


The stochastic programming models (17.2) and (17.7) can be readily extended 
to incorporate proportional transaction costs. Observe that in the absence of 
transaction costs a transaction to sell w shares of the risky asset when its price is 
S will generate a revenue equal to wS. By contrast, if a proportional transaction 
cost 0 applies to the sell transaction then the revenue would instead be (1—@)wS. 
Similarly, if a proportional transaction cost 0 applies to a buy transaction of w 
shares, then the cost of the transaction would be (1 + @)wS. 

The stochastic programming model (17.2) can be modified as follows to 
account for a proportional transaction cost 0 applicable to each buy or sell 
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transaction of the risky asset: 


min Sozo + 6|Sox9| + Yo 
s.t. Sywty_1+(1+r)yn-1 — 6|Sntn-_-1| = 9(Sw) 
Si£i—1 + (14+ r)yt—1 — O)Se (ae — ee-1)| > Site +, t=1,...,N—-1. 
(17.8) 
Similarly, the stochastic programming model (17.7) can also be modified to 
account for the same kind of transaction costs as follows: 


min Sozo + | Soxo| + Yo 

s.t. Swany t+ (1 + r)YN-1 — OlSnen-l > g(Sn) 
Stxt—1 + (1+ 1r)ye-1 — O|S¢ (ay — ve-1)| > Sete + ye, t= 1,...,N—-1 
Sraz_-1 + (14+ r)ye-1 — O|S¢ar| > gS), t=1,...,N—1. 


(17.9) 

Observe that the stochastic programs (17.9) and (17.8) include some term with 

absolute values in the objective and constraints. The models can be recast as 

linear stochastic programs by introducing some extra variables and constraints, 
as the following example illustrates. 


Example 17.2 Suppose n = 2, r = i, and the prices So, S1, S2 of the risky 
asset are as indicated at the nodes of aie binomial tree depicted in Figure 17.1. 
Assume that the two branches emerging from each node are equally likely. 
Determine the price of a European put option maturing at time N = 2 with 
strike price 50; that is, with payoff g(S2) = (50 — S2)™. Assume a proportional 
transaction cost 0 applies to every buy or sell transaction. 

In this case the deterministic equivalent of (17.8) is 


min 4029 + yo + 40@uo 

s.t. 80x0 + 1.25y0 — 80001 (H) > 8021(H) +Y1 (H) 
20x + 1.25yo — 20001 (T) > 2021(T) + yı (T) 
160xzı(H) + 1.25yı (H) — 1600w,(H) > (50 — 160)+ = 


40x (H) + 1.254, (H) — 408w: (H) > (50 — 40)* = 10 
40x: (T) + 1.2541 (T) — 400w: (T) > (50 — 40)* = 10 
10x1 (T) + 1.2541 (T) — 108w: (T) > (50 — 10)+ = 40 


uo È To, Uo 2 —To 

vı(H ) > xı(H ) — £0, vı (T) > —a1(T) + zo 
wi(H) > xı(H), wi(H) > —xı(H) 

wi(T) >2i(T), wı(T) > —21(T). 


Table 17.1 shows the optimal value and holdings 29, yo, xı (H), y1(#), «1(T), yi (T) 
of the optimal super-replicating portfolio for various levels of transaction cost 6. 
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Table 17.1 
0 Optimal value To Yo xı(H) yı(H) xı(T) y(T) 
0 9.6 —0.26666 20.26666 —0.08333 10.66666 —1 40 
0.01 9.93044 —0.26879 20.57443 —0.08251 10.66666 —0.9901 40 
0.05 11.24337 —0.27546 21.71077 —0.07937 10.66666 —0.95238 40 
0.1 12.84987 —0.28052 22.94857 —0.07576 10.66666 —0.90909 40 


Synthetic Options 


An important issue in portfolio selection is the potential decline of the portfolio 
value below some critical limit. How can we control the risk of downside losses? A 
possible answer is to create a payoff structure similar to a European call option. 

While a corporate investor may be able to construct a diversified portfolio, 
there may be no option market available on this portfolio. One solution may be 
to use index options. However, exchange-traded options with sufficient liquidity 
are limited to maturities of about three months. This makes the cost of long-term 
protection expensive, requiring the purchase of a series of highly priced short- 
term options. For large institutional or corporate investors, a cheaper solution 
is to artificially produce the desired payoff structure using available resources. 
This is called a synthetic option strategy. A model of this kind was proposed by 
Zhao and Ziemba (2001) and can be described as follows. 


Problem parameters: 
Wo = investor’s initial wealth 
T = investor’s planning horizon 
R = gross return of a riskless asset for one period 
Ri s = gross return for asset 7 at time t 
i+ = transaction cost for purchases and sales of asset 7 at time t. 


The gross returns R; above are random, but their distributions are known. 


Variables: 
Xi = amount allocated to asset 7 at time t 
Ai s = amount of asset i bought at time t 
Dj, = amount of asset i sold at time t 
yz = amount allocated to riskless asset at time t. 


We formulate a stochastic program that produces the desired payoff at the 
end of the planning horizon T, much in the flavor of the stochastic programs 
developed in the previous section. Let us first discuss the constraints. 

The initial portfolio must satisfy 


Yot@iot+...+2n.0 = Wo. 
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Similarly, the portfolio at time t must satisfy 


Die = Ritti t1 + Age Dit for t= 1,...,T 
Yt = Ryt—ı = Soa + 654) Ait + we = 65,4) Dit for i= 1, ees et be 
i=1 i=1 


One can also impose upper bounds on the proportion of any risky asset in the 
portfolio: 


n 
0 < Tit < M ye +Y Tja ; 
j=l 


where my is chosen by the investor. 
The value of the portfolio at the end of the planning horizon is 


nm 
v = Ryr-1ı + yd -6,7)Rirtir-i, 


i=l 


where the summation term is the value of the risky assets at time T. 

To construct the desired synthetic option, we split v into the riskless value of 
the portfolio Z and a surplus z > 0 which depends on random events. Using a 
scenario approach to the stochastic program, Z is the worst-case payoff over all 
the scenarios. The surplus z is a random variable that depends on the scenario. 
Thus 


v=Z+z, z>0. 
We consider Z and z as variables of the problem, and we optimize them 


together with the asset allocations x and other variables described earlier. The 
objective function of the stochastic program is 


max E(z) + uZ, 


where u > 1 is the risk aversion of the investor. The risk aversion p is given 
data. When u = 1, the objective is to maximize expected return. When p is very 
large, the objective is to maximize “riskless profit”. 


Example 17.3 Consider an investor with initial wealth Wo = 1 who wants to 
construct a portfolio comprising one risky asset and one riskless asset using the 
“synthetic option” model described above. We next describe the deterministic 
equivalent of this model for a two-period planning horizon, i.e. T = 2, and an 
event tree with four scenarios. The construction is similar to that in Example 
16.2. Suppose the return on the riskless asset is a non-random value R per period 
and there are two equally likely possible random outcomes (“H” and “T”) over 
each time period. Let R,(H) and R(T) denote the return of the risky asset 
in the period [t — 1,t] when the outcome is H and T respectively. Suppose the 
transaction cost for purchases and sales of the risky asset is a non-random value 6. 
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The scenario tree in this case is identical to that depicted in Figure 16.1 for 
Example 16.2. The deterministic equivalent of the multi-stage stochastic linear 
program in this case is as follows: 


max 4(2(HH) + 2(HT) + 2(TH) + 2(TT)) + uZ 
s.t. Yo +zro=l1 
xı(T) = Rı (T)xo GA A(T) — D,(T) 
yı(H) = Ryo — (1 + 0)A (H) + (1 — 0)Di (E) 
yı(T) = Ryo — (1 + @)Ai(T) + (1 — 6) Di (T) 
2(HH)+ Z = Ry (A) + (1 — 0) Ro(A)ai (A) 
2(HT) + Z = Ry, (1) + (1 — 0) Ro(T)ai (H) 
2(TH) + Z = Ry (T) + (1 — 0)R2(H)xı (T) 
2(TT) + Z = Ry (T) + (1 — 8) Ra(T) x1 (T) 
Zo, Yo, t1(H), x (TL ), yı(H a) i(T), Aı(H), Di (A), Ai(T), Ao(T) 20 
2(HH), z(HT), z(TH), 2(TT) > 0 
Z free. 


Zhao and Ziemba (2001) introduce and apply the above generic synthetic 
option model to an example with three assets (cash, bonds, and stocks) and 
four periods (a one-year horizon with quarterly portfolio reviews). The quarterly 
return on cash is constant at p = 0.0095. For stocks and bonds, the expected 
logarithmic rates of returns are s = 0.04 and b = 0.019 respectively. Transaction 
costs are assumed to be 0.5% for stocks and 0.1% for bonds. The scenarios needed 
in the stochastic program are generated using an autoregression model which is 
constructed based on historical data (quarterly returns from 1985 to 1998; the 
Salomon Brothers bond index and S&P 500 index respectively). Specifically, the 
autoregression model is 


St = 0.037 — 0.193s:—1 Tr 0.418b;_4 = 0.172s;_2 in 0.51 7b¢_2 + Et 
by = 0.007 — 0.140s¢_1 T 0.175b4—1 = 0.023s4_2 “Ty. 0.122b;_ + Nts 


where the pair (e+, n+) characterizes uncertainty. Zhao and Ziemba used a random 
sampling approach to estimate the joint distribution of (€+, m). From this joint 
distribution of (€+, n+) a set of 20 pairs can be selected to estimate the empirical 
distribution of (e+, m). In this way, a scenario tree with 160,000 (= 20 x 20 x 
20 x 20) paths describing possible outcomes of asset returns is generated for the 
four periods. 

The authors solved the resulting large deterministic linear program. We discuss 
some of the results obtained when this linear program is solved for a risk aversion 
of = 2.5. The value of the terminal portfolio is always at least 4.6% more than 
the initial portfolio wealth and the distribution of terminal portfolio values is 
skewed to larger values because of dynamic downside risk control. The expected 
return is 16.33% and the volatility is 7.2%. It is interesting to compare these 
values with those obtained from a static Markowitz model. The expected return 
is 15.4% for the same volatility but no minimum return is guaranteed. In fact, 
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in some scenarios, the value of the Markowitz portfolio is 5% less at the end of 
the one-year horizon than it was at the beginning. 

It is also interesting to look at a typical portfolio (one of the 160,000 paths) 
generated by the synthetic option model (the linear program was set up with an 
upper bound of 70% placed on the fraction of stocks or bonds in the portfolio). 


Portfolio value at 
Quarter t Cash Stocks Bonds the end of Quarter t 


1 12% 18% 70% 103 
2 41% 59% 107 
3 70% 30% 112 
4 30% 70% 114 


Exercises 


Exercise 17.1 For a non-dividend paying stock, collect data on four or five 
call options for the nearest maturity (but at least one month). Calculate the 
implied volatility for each option; that is, the value of o that makes equation 
(15.7) hold for the market prices of the call options. Solve the option pricing 
problem (17.7) when the number of stages is seven using the implied volatility 
of the at-the-money option to construct the tree. 


Exercise 17.2 Repeat Exercise 17.1 allowing for transaction costs, with dif- 
ferent values of 0, to see if the volatility smile can be explained by transaction 
costs. Specifically, given a value for ø and for 6, calculate option prices and see 
how they match up to observed prices. Try 6 = 0.001, 0.005, 0.01, 0.02, 0.05. 


Exercise 17.3 Develop a synthetic option model in the spirit of that used by 
Zhao and Ziemba (2001), adapted to the size limitation of your linear program- 
ming solver. Compare with a static model. 


Part IV 


Other Optimization 
Techniques 
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18.1.1 


Conic Programming: Theory 
and Algorithms 


Conic programming refers to a class of convex optimization problems that gen- 
eralizes linear and quadratic programming. The gist of conic programming is to 
replace the non-negativity constraint with a conic constraint. 


Conic Programming 


A conic program in standard form is an optimization problem of the form 


min c!x 
st. Ax=b (18.1) 
Dx-deK 


for some vectors and matrices c € R”, b € R”, d € R?, A € R™*”", D € R?*” 
and some closed convex cone K C R’. 

When K = Rf, the problem (18.1) is a linear program. However, conic pro- 
gramming is far more general. We next discuss two particularly important classes 
of conic programs, namely second-order and semidefinite programming. 


Second-Order Programming 


The second-order cone, also known as the Lorenz cone or the ice-cream cone, is 
defined as follows: 
x _ 
= fx = l ER”: ||Xllo < vo} : 
See Figure 18.1. 


A second-order program is a problem of the form (18.1) where K is a direct 
product of second-order cones; that is, 


K=Un, x ++: X Ln, 


for some positive integers n1,..., Mp. 

We next illustrate the modeling power of second-order programming by show- 
ing that a convex quadratically constrained quadratic program can be recast 
as a second-order program. In particular, second-order programming generalizes 
both linear programming and convex quadratic programming. 
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CEEE ACAENA) 


Figure 18.1 Second-order cone 


Consider a convex quadratically constrained quadratic program of the form 
min cdx + $x'Qox 
x 


18.2 
st. efx+ ox'Q,x < bi, i=1,...,m, oe) 


where c; € R”, Q; € S% for i =0,1,...,m and b; E€ R for i =1,...,m. Here S% 
denotes the family of n x n positive semidefinite matrices. 
Observe that (18.2) can be rewritten as 


min t 

x,t 

s.t. cgx+$x'Qox <t (18.3) 
c}x+ $x'Q;x <b, i=1,...,m. 


The following step is key in the formulation of (18.2) as a second-order program: 
given Q € St, c € R”, b € R the quadratic inequality 


clx+ 3x'Qx <b 


can be formulated as a second-order cone constraint. To see that, observe that 
because Q € S? there exists L € R"*? such that Q = LL" (in particular the 
Cholesky factorization satisfies this requirement). Therefore 


b—c'x+1 
c'x+ix'Qx<b & c'x+ilL'x|? <b $ b—c'x—1| € Lp+2. 
J2L'x 


It thus follows that (18.3) and in turn (18.2) can be rewritten as the following 
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second-order program: 


min t 
x,t 
t-c}x+1 
s.t. t—c§x—-—1] €Lp +2 
V2LIx 
bi — c} x +1 
b; —clx-1 € Lp,42, 7=1,...,m, 


V2 L]x 


where L; € R"*?: such that Q; = L;L} for i=0,1,...,m. 


Tracking Error and Volatility Constraints 


In the context of quantitative asset management, portfolios are typically chosen 
relative to some predetermined benchmark, as we discussed in Section 6.5. As a 
consequence, it is common to use a constraint on the active risk (also known as 
tracking error) instead of, or in addition to, the total risk. 

More precisely, suppose x denotes the vector of percentage holdings in a 
portfolio. Let r and rg denote respectively the vector of asset returns and the 
benchmark return. Recall that the active return is the difference r" 
the portfolio return and the benchmark return. The active risk is the variance 
of the active return. If x? denotes the vector of percentage holdings in the 
benchmark, then the active risk can be written as 


x—rp between 


var(r'! (x — x?)) = (x — x?) V (x — x”), 


where V is the covariance matrix of asset returns. 
A typical mean-variance model for benchmark-relative portfolio management 
has the following form: 


max Q xXx 

st. (x—x®)TV(x—x®) < y? (18.4) 
Ax=b 
Cx<d 


Note that this is not a quadratic program because it has a nonlinear constraint. 
However, the problem (18.4) is a convex quadratically constrained quadratic 
program of the form (18.2) discussed in Section 18.1.1. Therefore, it has a 
straightforward formulation as a second-order conic program. 

The above model can be readily extended to include multiple measures of 
risk. For instance, the following model, which is an extension of the model dis- 
cussed by Jorion (2003) that enforces upper bound constraints on both total risk 
and tracking error, also has a straightforward second-order conic programming 
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formulation: 
max a'x 
s:t. (x—x?)'V(x—x®) < ¥? 
x! Vx <? (18.5) 
Ax=b 
Cx <d 


18.1.2 Semidefinite Programming 


Some applications, such as the approximation of covariance matrices discussed 
below, lead to conic optimization models involving the space of symmetric matri- 
ces and the cone of positive semidefinite matrices described next. Let S” denote 
the space of n x n symmetric matrices. Although this space is equivalent to 
R("+1)/2 it is more convenient and customary to treat it as a space of matrices. 
A matrix X € S” is positive semidefinite if 


u’ Xu > 0 for all u € R”. 


It is a common convention to write X > 0 to indicate that X € S” is positive 
semidefinite. The cone of positive semidefinite matrices S} is defined as 


m = {X €S":X> 0}. 


See Figure 18.2. 


X=[X,, Xi Xz Xoo] positive semidefinite 


ie 


Figure 18.2 Cone of positive semidefinite matrices 


A semidefinite program is a problem of the form (18.1) where K is the cone of 
positive semidefinite matrices. 
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Endow the space S” of symmetric n x n matrices with the following Frobenius 
inner product. For X,S € S” let 


XeS:= trace(XS) = 5 XijSij. 
i,j 


A semidefinite program is in standard form if it is written as 
min CeX 
x 
st. AX=b 
X > 0, 


where C € S”, b € R”, and A: S” > R” is a linear mapping. 


Approximating Covariance Matrices 


We next illustrate the modeling power of semidefinite programming by showing 
that a covariance estimation problem can be recast as a semidefinite program. 
To that end, recall that any proper covariance matrix must be symmetric and 
positive semidefinite. Suppose Ý € S” is an estimate of a covariance matrix that 
is not necessarily positive semidefinite and consider the problem of finding the 
positive semidefinite matrix that is closest to V: that is, 


min |X- V|| 
X 


(18.6) 
st. X>0. 


This problem can be formulated as 


min t 

X,t 7 

s.t. IX-VI <t 
X > 0. 


If the norm ||X — V]| is the Frobenius norm 


IX- Vile := [So (Ky - Vis)’, 


ij 


then the constraint |X — V||7 < t can be written as a second-order cone 
constraint and it follows that the above covariance estimation problem (18.6) 
can be written as a conic program over the Cartesian product of a second-order 
cone and a semidefinite cone. As we detail in the exercises at the end of this 
chapter, problem (18.6) can also be formulated as a conic program over a suitable 
semidefinite cone for other choices of norms such as the operator norm or the 
infinity norm. 
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Numerical Conic Programming Solvers 


During the last three decades there have been major advancements in the theory, 
algorithms, and software for conic programming. In particular, the software 
packages SeDuMi and SDPT3 are MATLAB-based, freely available solvers for conic 
programs. The commercial software vendors Gurobi and MOSEK also offer solvers 
for conic programs. These solvers constitute the engine behind the MATLAB 
modeling language CVX that we have discussed in previous chapters. 

Both SeDuMi and SDPT3 as well as most other software packages are designed 
to solve a conic program in the following standard form: 


min c!x 
x 
s.t. Ax=b 
x EK, 


where the cone K is a Cartesian product of the form Rf x R x Ln, X +++ xX 
Ln, X s% XX s, In other words, the conic constraint x € K models a vector 
x that has a block of f free components, followed by a block of £ non-negative 
components, followed by r blocks of second-order cone-constrained components, 
and finally followed by k blocks of semidefinite-constrained components. The 
package SeDuMi uses the following syntax: 


>> [x,y,info] = sedumi(A,b,c,K) ; 


Here K is a MATLAB structure with fields K.f, K.1, K.q, K.s detailing the 
dimensions of the above blocks. The matrix and vectors A, b, c should be of the 
appropriate dimensions. 

The package SDPT3 uses a similar syntax. It should be noted that although 
the process of formatting a particular problem in the appropriate SeDuMi or 
SDPT3 format is relatively routine, the particular details and steps could in some 
cases introduce errors. The modeling environment provided by CVX performs that 
formatting in an automated fashion. 


Duality and Optimality Conditions 


As in linear and quadratic programming, there is a dual conic program associated 
with every primal conic program. The construction of the dual conic program 
relies on the following more fundamental construction. Let K C RP be a closed 
convex cone. The dual cone K* C R? of K is defined as 


K* := {s € R?: s'x > 0 for all x € K}. 


It is easy to see that K* C R’ is also a closed convex cone. 
Just as we did for linear and quadratic programming, the dual problem can 
be derived via the following Lagrangian function associated with (18.1): 


L(x,y,s) := c'x +y' (b — Ax) +s' (d — Dx). 
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The constraints of (18.1) can be encoded via the Lagrangian function. For a 
given vector x 


{ c'x if Ax=b and Dx—dekK 


L s)= 
ae (x, y,5) +oo otherwise. 


sek* 


Therefore the primal problem (18.1) can be written as 


min max L(x, y,s). 
ys 
% seEk* 


The dual problem is obtained by flipping the order of the min and max opera- 
tions: 


max min L(x, y,s). 
y, 
sEK* % 


It is easy to see that the dual problem can be written as follows: 
max b'y+d's 
y,s 


st. Aly+D's=c (18.7) 
sek*. 


In particular, when the primal problem is in the following standard form, 


min c!x 
x 
s.t. Ax=b 
xEK, 
the dual problem is 
max by 
y,s 
st. Aly+s=c 
sek*. 


Observe that the dual problem of a conic program is again a conic program. 
As in linear and quadratic programming, there is a deep connection between 
the primal problem (18.1) and its dual (18.7). The following result follows by 
construction. 


Theorem 18.1 (Weak duality) Assume x is a feasible point for (18.1) and 
(y,s) is a feasible point for (18.7). Then 
b'y +d's<c'x. 
Proof If x and (y,s) satisfy the above assumptions then 
bly + d's < (Ax)'y + (Dx)'s 
=(A'y+D's)'x 


c'x. 
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In contrast to linear and quadratic programming, strong duality does not 
always hold for conic programming. Fortunately, strong duality holds under a 
mild regularity assumption. The conic program (18.1) satisfies the Slater condi- 
tion if there exists x € R” such that 


Ax = b, Dx- d € relint(X), 


where relint(X) denotes the relative interior of the set K. 
Similarly, the conic program (18.7) satisfies the Slater condition if there exists 
y € R” and s € relint(K*) such that 


Aly+D's=c. 


Theorem 18.2 (Strong duality) Suppose the problems (18.1) and (18.7) satisfy 
the Slater condition. Then both problems have optimal solutions and their optimal 
values are the same. 


We refer the reader to Giiler (2010) or Renegar (2001) for a proof of Theorem 
18.2. The following characterization of the solutions to both (18.1) and (18.7) 
readily follows from Theorem 18.1 and Theorem 18.2. 


Theorem 18.3 (Optimality conditions) The vectors x € R” and (y,s) € R” x 
R” are optimal solutions to (18.1) and (18.7) respectively if 


c—Aly—D's=0 


Ax—b=0 
Dx-—dek (18.8) 
s € k* 
s'(Dx — d) =0. 


The following partial converse also holds: if (18.1) and (18.7) satisfy the Slater 
condition then they both have optimal solutions x and (y,s) that satisfy (18.8). 


For a conic program in standard form, the optimality conditions (18.8) can be 
written as follows: 
Aly+s=c 
Ax=b 
xEK (18.9) 
s € k* 


s'x =0. 


Algorithms 


By relying on the structure of the cone K, the main algorithmic template of 
interior-point methods, such as the one described in Chapter 2 and in Chapter 
5, can be extended to a larger class of conic programs. The central idea is to 
generate a sequence of points that converges to a solution to (18.9). We next 
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sketch the gist of interior-point methods for semidefinite programming. For a 
full and in-depth treatment of this interesting material we refer the reader to the 
seminal articles by Nesterov and Todd (1997, 1998) and Schmieta and Alizadeh 
(2001, 2003) and the textbooks by Nesterov (2004), Nesterov and Nemirovskii 
(1994), and Renegar (2001). 
For convenience of exposition, we consider a semidefinite program in standard 
form: 
min CeX 
st. AX=b (18.10) 
X = 0, 


where C € S”, b € R”, and A: S” > R” is a linear mapping. As the exercises 
at the end of this chapter detail, the dual of (18.10) is the semidefinite program 


min bly 

y,s 

s.t. A*y+S=C (18.11) 
S>0, 


where A* : R™ — S” denotes the adjoint of A; that is, the unique linear mapping 
satisfying 


(AX)'y =XeA*y 


for all X € S” and all y € R”. 
We will rely on the following key property of positive semidefinite matrices: 


X,S €S andXeS=0 > XS=0. (18.12) 


From Theorem 18.3 and (18.12) it follows that X and (y, S) are optimal solutions 
to (18.10) and (18.11) respectively if 


A*y+S=C 
AX=b 
wees (18.13) 
X,S>+0. 


As in the linear programming case, interior-point methods for semidefinite pro- 
gramming generate a sequence of iterates that satisfy X,S > 0. Each iteration of 
the algorithm aims to make progress towards satisfying A*y +S = C, AX =b, 
and XS = 0. 

Given p > 0, let (X(t), y(u), S(u)) be the solution to the following perturbed 
version of the above optimality conditions: 


A*y+S—-—C 0 
AX—b =|0],X,S>0. 
XS pl 


The first condition above can be written as r,(X,y,S) = 0 for the residual 


Conic Programming: Theory and Algorithms 


vector: 
A*y+S-—C 
ra(X,y,S) := AX—b 
XS — pl 


The central path is the set {(X(u), y(u), S(u)) : u > 0}. It is intuitively clear 
that (X(n), y(u), S(u)) converges to an optimal solution to both (18.10) and 
(18.11). This suggests the following algorithmic strategy: suppose (X,y,S) is 
“near” (X(u), y(u), S(u)) for some u > 0. Use (X,y,S) to move to a better 
point (X*,y*,S*) “near” (X(ut),y(ut), S(u*)) for some pt < p. 

It can be shown that if a point (X,y,S) is on the central path, then the 
corresponding value of u satisfies X è S = ny. Likewise, given X,S > 0, define 


XeS 
H(X, 8) = 
n 


To move from a current point (X, y, S) to a new point, we use a suitable Newton 
step; that is, the solution to the following system of equations obtained as a 
linearization of the system of nonlinear equations r,,(X,y,S) = 0: 


0 A* I] [Ax C-Ay-S 
A 0 0|/Ay|=| b-AX (18.14) 
F 0o G| {As uX- -S 


for some suitably chosen mappings F, G that depend on the current X, S. The 
details of these mappings are somewhat technical and related to nuances con- 
cerning the space of symmetric n x n matrices. Further details can be found 
in Renegar (2001) and in the exercises at the end of the chapter. 

Algorithm 18.1 presents a template for an interior-point method for semidefi- 
nite programming. 


Algorithm 18.1 Interior-point method for semidefinite programming 

1: choose X°,S° > 0 

2: for k =0,1,... do 

3: solve the Newton system (18.14) for (X,y,S) = (X*,y*,S*) and u := 
0.1u(X*,S*) 

4: choose a step length a € (0, 1] and set (X**1, y**!, Skt!) — 
(X*, yt, SE) + a(AX, Ay, AS) 

5: end for 


The step length a in step 4 should be chosen so that X*+!,S*+! > 0 and 
the size of r,(X*t!,y**1,S**1) is sufficiently smaller than r,,(X*,y”,S*). A 
line-search procedure such as the one described in Algorithm 2.4 in Chapter 2 
can be used for choosing the step length a. 
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Notes 


The seminal works of Alizadeh (1991) and Nesterov and Nemirovskii (1994) 
triggered a massive burst of research activity in optimization. This eventually 
led to a mature theory and computational technology for solving important 
classes of conic programs, notably second-order and semidefinite programming. 
A particularly important development was the extension of primal—dual interior- 
point methods to conic programs over symmetric cones by Nesterov and Todd 
(1997, 1998). Such cones include the non-negative orthant, the second-order 
cone, the semidefinite cone, and Cartesian products of them. The textbook 
by Renegar (2001) gives an excellent exposition of the main advances in conic 
programming. 

The software packages SeDuMi and SDPT3 were developed respectively by the 
late Sturm (1999) and by Toh et al. (1999). These two packages are some of the 
default engines used by the MATLAB-based modeling language CVX. 


Exercises 


Exercise 18.1 Recall that the trace of a square matrix M € R”*” is 


n 


trace(M) = 5 Mii. 
Suppose A € R™®*”, B € R”*?, and C € R?*™. Show that the trace satisfies 
the following property: 
trace(ABC) = trace(CAB). 
Exercise 18.2 
(a) Suppose X € S7. Show that 
trace(X)=0 > X=0. 


(b) Suppose L € R"*™. Show that LL! € S%. The converse is true also but it 
is harder to show: if X € S then there exists L € R"*"™ for some m such 
that X = LL". 

(c) Show an example of two matrices X, S € S” such that XS g S”. 

(d) Prove (18.12). That is, show that 


X,S € S}, trace(XS)=0 => XS=0O. 


Hint: Observe that this is not immediate from part (a) because by part (c) 
a priori we do not even know if XS € S”. To get around this difficulty, use 
part (b), Exercise 18.1, and part (a). 
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Exercise 18.3 Show that the Lorenz cone and the semidefinite cone are “self- 
dual.” In other words, show that 


and 
(S7)* = S}. 


Exercise 18.4 This exercise shows that semidefinite programming includes as 
special cases both linear and second-order programming. 


(a) Suppose x € R” and X = diag(x) € S”. Show that 


x>0 Ss X>0. 


Xo X 


(b) Suppose x = H € R” and X = | 
X X zol- 


€ S”. Show that 


xELn, e X>0. 


(c) Use (a) and (b) to conclude that any linear program or second-order program 
can be recast as a semidefinite program. 
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Robust Optimization 


In many optimization models the inputs to the problem are either not known at 
the time the problem must be solved, are computed inaccurately, or are otherwise 
uncertain. Since the solutions obtained can be quite sensitive to these inputs, one 
serious concern is that we are solving the wrong problem, and that the solution 
we find is far from optimal for the correct problem. Robust optimization is an 
approach to optimization problems with data uncertainty to obtain solutions 
that are good for all or most possible realizations of the uncertain parameters. 


Uncertainty Sets 


In robust optimization, the description of the parameter uncertainty is formal- 
ized via uncertainty sets. Uncertainty sets can represent or may be formed by 
difference of opinions on the possible values of problem parameters, alternative 
estimates of parameters generated via statistical techniques from historical data, 
Bayesian, or other estimation techniques. The size of the uncertainty set is 
typically determined by the level of desired robustness. 

Some of the most common types of uncertainty sets encountered in robust 
optimization models include the following: 


e Uncertainty sets representing a finite number of scenarios generated for the 
possible values of the parameters: 
u = {P1, P2, reas , Pk}. 


e Uncertainty sets representing the convex hull of a finite number of scenarios 
generated for the possible values of the parameters (these are sometimes 
called polytopic uncertainty sets): 


u= conv{P1, P2, RATE) Pk}- 
e Uncertainty sets representing an interval description for each uncertain param- 
eter: 
U= {p:l<p<u}. 
Confidence intervals encountered frequently in statistics can be the source 
of such uncertainty sets. 
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e Ellipsoidal uncertainty sets: 
U = {p : p = po + Mu, |ul] < 1}. 


These uncertainty sets can also arise from statistical estimation in the form 
of confidence regions; see Goldfarb and Iyengar (2003). In addition to their 
mathematically compact description, ellipsoidal uncertainty sets have the 
nice property that they smooth the optimal value function (Werner, 2010). 


It is a non-trivial task to determine the uncertainty set that is appropriate for 
a particular model as well as the type of uncertainty sets that lead to tractable 
problems. As a general guideline, the shape of the uncertainty set will often 
depend on the sources of uncertainty as well as the sensitivity of the solutions to 
these uncertainties. The size of the uncertainty set, on the other hand, will often 
be chosen based on the desired level of robustness. When uncertain parameters 
reflect the “true” values of moments of random variables, as is the case in 
mean-variance portfolio optimization, we simply have no way of knowing these 
unobservable true values exactly. In such cases, after making some assumptions 
about the stationarity of these random processes, we can generate estimates of 
these true parameters using statistical procedures. Goldfarb and Iyengar (2003), 
for example, show that if we use a linear factor model for the multivariate returns 
of several assets and estimate the factor loading matrices via linear regression, the 
confidence regions generated for these parameters are ellipsoidal sets and they 
advocate their use in robust portfolio selection as uncertainty sets. To generate 
interval-type uncertainty sets, Tütüncü and Koenig (2004) use bootstrapping 
strategies as well as moving averages of returns from historical data. The shape 
and the size of the uncertainty set can significantly affect the robust solutions 
generated. However, with few guidelines backed by theoretical and empirical 
studies, their choice remains a mix of art and science. 


Different Flavors of Robustness 


As we next describe, different types of robustness arise depending on what 
parameters of a problem are uncertain, and depending also on what exactly 
constitutes a “good” robust solution. 


Constraint Robustness 


Constraint robustness refers to situations where the uncertainty is in the con- 
straints and we seek solutions that remain feasible for all possible values of 
the uncertain inputs. This type of solution is required in many engineering 
applications. Typical instances include multi-stage problems where the uncertain 
outcomes of earlier stages have an effect on the decisions of the later stages and 
the decision variables must be chosen to satisfy constraints no matter what 
happens with the uncertain parameters of the problem. 
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Here is a precise mathematical model for finding constraint-robust solutions. 
Consider an optimization problem of the form 


min f(x) 


(19.1) 
s.t. G(x,p)EK. 


In this problem x is the vector of decision variables, f(x) is the (certain) objective 
function, G and K are the structural elements of the constraints assumed to be 
certain, and p is the vector of possibly uncertain parameters of the problem. 
Consider an uncertainty set U that contains all possible values of the uncertain 
parameters p. Then, a constraint-robust optimal solution can be found by solving 
the following problem: 


min f(x) 


(19.2) 
s.t. G(x,p)€K, forall peu. 


The feasible set in the robust optimization model (19.2) is the intersection of the 
feasible sets: 


S(p) = {x:G(x,p) € K}, peu. 

We note that there are no uncertain parameters in the objective function of the 
problem (19.1). However, this is not a restrictive assumption. An optimization 
problem with uncertain parameters in both the objective function and con- 
straints can be easily reformulated to fit the form in (19.1). Indeed, the problem 

min f(x,p) 

x 
st. G(x,p)E€K 


is equivalent to the problem 


min t 
st. f(x,p)<t 
G(x, p) E€ K. 


This last problem has all its uncertainties in its constraints only. 


Objective Robustness 


Another important robustness concept is objective robustness. This refers to 
solutions that will remain close to optimal for all possible realizations of the 
uncertain problem parameters. Since such solutions may be difficult to obtain, 
especially when uncertainty sets are relatively large, an alternative goal for 
objective robustness is to find solutions whose worst-case behavior is optimized. 
The worst-case behavior of a solution corresponds to the value of the objective 
function for the worst possible realization of the uncertain data for that partic- 
ular solution. We now develop a mathematical model that addresses objective 
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robustness. Consider an optimization problem of the form: 


min f(x, p). 


Here, S is a (certain) feasible set and f(x,p) is the objective function that 
depends on uncertain parameters p. As before, let Y denote the uncertainty set 
that contains all possible values of the uncertain parameters p. Then an objective 
robust solution can be obtained by solving the saddle-point problem 


mma f (x, p). 


As indicated at the end of the previous subsection, objective robustness can be 
seen as a special case of constraint robustness via a suitable reformulation. How- 
ever, it is important to distinguish between these two problem variants as their 
“natural” robust formulations lead to two different classes of optimization for- 
mulations. Robust-constraint problems naturally lead to optimization problems 
with infinitely many constraints whereas robust-objective problems naturally 
lead to saddle-point problems. There are different methodologies available for 
each of these two problem classes. 


Relative Robustness 


The focus of constraint and objective robustness models on an absolute measure 
of worst-case performance is not consistent with risk tolerances of many decision 
makers. Instead, we may prefer to measure the worst case in a relative manner, 
relative to the best possible solution under each scenario. This leads us to the 
notion of relative robustness. Consider the optimization problem 


min f(x, p), (19.3) 


where p is uncertain with uncertainty set U. To simplify the description, we 
restrict our attention to the case with objective uncertainty and assume that the 
constraints are certain. Given a fixed p € U, let z*(p) denote the optimal value 
function 


z* (p) := min f(x, p). 


Furthermore, define the optimal solution map 
x*(p) = arg min f(x, p). 
xES 


Note that z*(p) can be extended-valued and x* (p) can be set-valued. To motivate 
the notion of relative robustness we first define a measure of regret associated 
with a decision after the uncertainty is resolved. If we choose x as our vector 
and p is the realized value of the uncertain parameter, the regret associated with 
choosing x instead of an element of x*(p) is defined as 


r(x, p) = f(x, p) — 2* (p) = f(x, p) — f(x" (p), p). 
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Note that the regret function is always non-negative and can also be regarded 
as a measure of the “benefit of hindsight”. Now, for a given x € S consider the 
maximum regret function: 
R(x) := mR p) = mex f(x, p) — 2* (p). 
A relative robust solution to problem (19.3) is a vector x that minimizes the 
maximum regret: 
ama (x, p) — 2" (p). (19.4) 

While they are intuitively attractive, relative robust formulations can also be sig- 
nificantly more difficult than the standard absolute robust formulations. Indeed, 
since z*(p) is the optimal value function and involves an optimization problem 
itself, the problem (19.4) is a three-level optimization problem as opposed to the 
two-level problems in absolute robust formulations. Furthermore, the optimal 
value function z*(p) is rarely available in analytic form, is typically non-smooth, 
and is often hard to analyze. Another difficulty is that if f is linear in p, as is often 
the case, then z*(p) is a concave function. Therefore, the inner maximization 
problem in (19.4) is a convex maximization problem and is difficult for most U. 

A simpler variant of (19.4) can be constructed by deciding on the maximum 
level of regret to be tolerated beforehand and by solving a feasibility problem 
instead with this level imposed as a constraint. For example, if we decide to limit 
the maximum regret to R, then the problem to solve becomes the following: find 
an x satisfying x € S such that 


f(x,p) — 2*(p) < R, forall peu. 


If desired, one can then perform a bisection search on R to find its optimal 
value. Another variant of relative robustness models arises when we measure the 
regret in terms of the proximity of our chosen solution to the optimal solution 
set rather than in terms of the optimal objective values. For this model, consider 
the following distance function for a given x and p: 
d(x,p)= inf ||Ix—x*|). 
x* Ex*(p) 

When the solution set is a singleton, there is no optimization involved in the 
definition. As above, we then consider the maximum distance function 


D(x) = max d(x,p)=max inf ||x—x*|]. 
(x) = max dsp) = max inf Ipc —x"| 


For relative robustness in this new sense, we seek x that solves 


i d i 19.5 
ioa (x, p) (19.5) 


This variant is an attractive model for cases where we have time to revise our 
decision variables x, perhaps only slightly, once p is revealed. In such cases, we 
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will want to choose an x that will not need much perturbation under any scenario, 
i.e., we seek the solution to (19.5). This model can also be useful for multi-period 
problems where revisions of decisions between periods can be costly. Portfolio 
rebalancing problems with transaction costs are examples of such settings. 


Techniques for Solving Robust Optimization Models 


In this section we review a few of the commonly used techniques for the solution 
of robust optimization problems. The tools we discuss are essentially reformula- 
tion strategies for robust optimization problems so that they can be rewritten as a 
deterministic optimization problem with no uncertainty. In these reformulations, 
we look for economy, so that the new formulation is not much bigger than the 
original, “uncertain” problem, and tractability, so that the new problem can be 
solved efficiently using standard optimization methods. 

The variety of the robustness models and the types of uncertainty sets rule out 
a unified approach. However, there are some common threads and the material 
in this section can be seen as a guide to the available tools which can be 
combined or appended with other techniques to solve a given problem in the 
robust optimization setting. 


Sampling 


One of the simplest strategies for achieving robustness under uncertainty is to 
sample several scenarios for the uncertain parameters from a set that contains 
possible values of these parameters. This sampling can be done with or with- 
out using distributional assumptions on the parameters and produces a robust 
optimization formulation with a finite uncertainty set. If uncertain parameters 
appear in the constraints, we create a copy of each such constraint corresponding 
to each scenario. Uncertainty in the objective function can be handled in a similar 
manner. Consider the generic uncertain optimization problem 


min f(x, p) 
s.t. G(x,p)€K, forall peu. 


If the uncertainty set U is a finite set, i.e., U = {pi,po,..-, Pk}, the robust 
formulation is obtained as follows: 


min t 
x,t 


s.t. f(x,pi)<t,7=1,...,k 
G(x,p;) € K, tS bhais k 


Note that no reformulation is necessary in this case and the duplicated 
constraints preserve the structural properties (linearity, convexity, etc.) of the 
original constraints. Consequently, when the uncertainty set is a finite set the 
resulting robust optimization problem is larger but theoretically no more difficult 
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than the non-robust version of the problem. The situation is somewhat similar 
to stochastic programming formulations. Examples of robust optimization 
formulations with finite uncertainty sets can be found, for example in Rustem 
and Howe (2002). 


Conic Optimization 


Moving from finite uncertainty sets to continuous sets such as intervals or ellip- 
soids presents a theoretical challenge. The robust version of an uncertain con- 
straint that has to be satisfied for all values of the uncertain parameters in a 
continuous set results in a semi-infinite optimization formulation. These problems 
are called semi-infinite since there are infinitely many constraints but only finitely 
many variables. 

Fortunately, for some types of uncertainty sets, it is possible to reformulate 
their robust semi-infinite programming versions using a finite set of conic con- 
straints. To illustrate this, consider the following simple linear program: 


max r!x 
Ste Tied (19.6) 
x > 0. 


What is the optimal solution to this linear program? 
Now suppose the objective coefficients are uncertain with ellipsoidal uncer- 
tainty, e.g., suppose the objective coefficient vector r can be any element in the 


u = fr: lr — wll2 <a}, 


where p is the “nominal” value of r. The robust version of (19.6) is 


uncertainty set 


max min r'x 
x reU 


s.t. 1'x=1 


Some simple calculations show that for a given x 


: T T 
= ux — ô- |xl]2: 
minr x= pl X I|x\l2 


Thus the robust version of (19.6) is 
max p'x—6-||x|l2 


s.t. I'x=1 
x > 0. 
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The latter problem can be rewritten as the following conic program: 


max plx—6-t 
x,t 


st. 1I'x=1 


t 
(Jet 


x > 0. 
More generally, suppose r has the following ellipsoidal uncertainty set: 
U = {r : (r - p)"E E- p) < 8} 


for some symmetric and positive definite matrix X. Then the robust version of 
(19.6) is 


max plx—6-Vx!¥x 
s.t. 1'x=1 
x>0, 


which again can be formulated as a conic program. Observe the resemblance 
between the latter model and Markowitz’s mean-variance model. 

The machinery of robust optimization can be applied to mean-variance port- 
folio optimization to mitigate the effects of estimation errors in the expected 
returns and/or in the covariance matrix (Ceria and Stubbs, 2006; Goldfarb and 
Iyengar, 2003). The basic idea is to consider the mean-variance optimization 
problem in one of its forms, e.g., 


max pu'x-— iy -x' Vx 
s.t. Ax=b (19.7) 
Cx <d, 
and assume p belongs to some uncertainty set, 
U = {p : (u— A'E (u — fr) < 6°}. 
Then the robust version of (19.7) is 
max p'x—6-VxlEx— ły- x! Vx 
s.t. Ax=b (19.8) 
Cx <d. 


We next show that (19.8) is a conic program. To that end, it suffices to find 
a conic representation of the objective function. Let R € R"*?,L € R”X4 be 
such that © = RR! and V = LL’. Both R and L exist because © and V are 
positive semidefinite. By introducing new variables s,t, the problem (19.8) can 


19.4 


19.4 Some Robust Optimization Models in Finance 297 


be rewritten as the following conic program: 


aT 
max ĝĤ x—6-s—3y-t 
x,s,t 


s.t. Fa € Lp+1 


t+1 
t-1 E Lg+2 
2Lx 


Ax=b 
Cx <d. 


Saddle-Point Characterizations 


For the solution of problems arising from objective uncertainty, the robust solu- 
tion can be characterized using saddle-point conditions when the original prob- 
lem satisfies certain convexity assumptions. The benefit of this characterization is 
that we can then use algorithms such as interior-point methods already developed 
and available for saddle-point problems. As an example of this strategy, consider 
the objective-robust formulation discussed in Section 19.2: 


in ma: . 19.9 
min max f(x, p) (19.9) 
We note that the dual of this robust optimization problem is obtained by chang- 
ing the order of the minimization and maximization problems: 
i ; 19.10 
max min f(x, p) (19.10) 


Under mild assumptions on f, S,U, there exists a saddle-point solution (x*,p*) € 
S x U such that 


f(x", p) < f(x*,p*) < f(x,p*) forall x€ 5S, peu. 


This characterization is the basis of the robust optimization algorithms given 
in Tütüncü and Koenig (2004). 


Some Robust Optimization Models in Finance 


Since many financial optimization problems involve future values of security 
prices, interest rates, exchange rates, etc., which are not known in advance 
but can only be forecasted or estimated, such problems fit perfectly into the 
framework of robust optimization. We next describe some examples of robust 
optimization formulations for a variety of financial optimization problems. 
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Robust Profit Opportunities in Risky Portfolios 


Consider an investment environment with n financial securities whose future 
price vector r € R” is a random variable. Let p € R” represent the current 
prices of these securities. If the investor chooses a portfolio x = [x1 e Tn] 
that satisfies 


p'x<0 
and the realization of the random variable r satisfies 
r'x>0 (19.11) 


then there is an arbitrage opportunity: an investor could make money by 
constructing the portfolio x with negative cash flow (pocketing money) and 
subsequently collecting the non-negative cash flow r'x of the portfolio x. 

Since arbitrage opportunities generally do not persist in financial markets, one 
might be interested in the alternative and weaker profitability notion where the 
non-negativity of the portfolio is only guaranteed to occur with high probability. 
More precisely, consider the following relaxation of (19.11): 


P(r'x > 0) > 0.99. (19.12) 


Let u and Q represent the expected future price vector and covariance matrix 
of the random vector r. Then E(r) = u'x and stdev(r'x) = \/x'Qx. If the 
random vector r is Gaussian, then (19.12) is equivalent to 


p'x —6- \/xTQx > 0, 


where 0 = ©~1(0.99) and © is the standard normal cumulative distribution. 
Therefore, if we find an x satisfying 


p'x—0-\/x™Qx>0, p'x <0, 


for a large enough positive value of 0, we have an approximation of an arbitrage 
opportunity. Note that, by relaxing the constraint p'x < 0 as p'x < 0 or as 
p'x < —e, we obtain a conic feasibility system. Therefore, the resulting system 


can be solved using the conic optimization approaches. 
We next explore some portfolio selection models that incorporate the uncer- 
tainty of problem inputs. 


Robust Portfolio Selection 


This section is adapted from Tütüncü and Koenig (2004). Recall that Markowitz’s 
mean-variance optimization problem can be stated in the following form, which 
combines the reward and risk in the objective function: 


Ta a h ST 
maxu X— 5° X Qx. (19.13) 


Here u and Q are respectively estimates of the vector of expected values and 
covariance of returns of a universe of securities, and y is a risk-aversion constant 
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used to trade off the reward (expected return) and risk (portfolio variance). The 
set X is the set of feasible portfolios which may carry information on short- 
sale restrictions, sector distribution requirements, etc. Since such restrictions are 
predetermined, we can assume that the set X is known without any uncertainty 
at the time the problem is solved. 

Recall also that solving the problem above for different values of y we obtain 
the efficient frontier of the set of feasible portfolios. The optimal portfolio will be 
different for individuals with different risk-taking tendencies, but it will always 
be on the efficient frontier. 

One of the limitations of this model is its need to accurately estimate the 
expected returns and covariances. In Bawa et al. (1979), the authors argue that 
using estimates of the unknown expected returns and covariances leads to an 
estimation risk in portfolio choice, and that methods for optimal selection of 
portfolios must take this risk into account. Furthermore, the optimal solution 
is sensitive to perturbations in these input parameters — a small change in 
the estimate of the return or the variance may lead to a large change in the 
corresponding solution; see, for example, Michaud and Michaud (2008). This 
attribute is unfavorable since the modeler may want to periodically rebalance 
the portfolio based on new data and may incur significant transaction costs to 
do so. Furthermore, using point estimates of the expected return and covariance 
parameters does not fulfill the needs of a conservative investor. Such an investor 
would not necessarily trust these estimates and would be more comfortable 
choosing a portfolio that will perform well under a number of different scenarios. 
Of course, such an investor cannot expect to get better performance on some of 
the more likely scenarios, but will have insurance for more extreme cases. All 
these arguments point to the need of a portfolio optimization formulation that 
incorporates robustness and tries to find a solution that is relatively insensitive 
to inaccuracies in the input data. Since all the uncertainty is in the objec- 
tive function coefficients, we seek an objective robust portfolio, as outlined in 
Section 19.2. 

For robust portfolio optimization we consider a model that allows return and 
covariance matrix information to be given in the form of intervals. For exam- 
ple, this information may take the form “the expected return on security 7 is 
between 8% and 10%” rather than claiming that it is 9%. Mathematically, we 
will represent this information as membership in the following set: 


U=4{(m, Q): <wsp’, Q*<Q<Q"”, Q20}, (19.14) 


where u}, uY, Q}, QY are the extreme values of the intervals we just mentioned. 
The restriction Q = 0 is necessary since Q is a covariance matrix and, therefore, 
must be positive semidefinite. These intervals may be generated in different ways. 
An extremely cautious modeler may want to use historical lows and highs of 
certain input parameters as the range of their values. One may generate different 
estimates using different scenarios on the general economy and then combine the 
resulting estimates. Different analysts may produce different estimates for these 
parameters and one may choose the extreme estimates as the endpoints of the 
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intervals. One may choose a confidence level and then generate estimates of 
covariance and return parameters in the form of prediction intervals. 

We want to find a portfolio that maximizes the objective function in (19.13) in 
the worst-case realization of the input parameters p and Q from their uncertainty 
set U in (19.14). Given these considerations the robust optimization problem 
takes the following form 

P T Y cael 
max min x— >x Qx. 19.15 
XEY (u,Q)EU 2 Q ( ) 
This problem can be expressed as a saddle-point problem and be solved using 
the technique outlined in Halldórsson and Tütüncü (2003). 


Relative Robustness in Portfolio Selection 


We consider the following simple three-asset portfolio model from Ceria and 
Stubbs (2006): 


max pu'x 

s.t. TE(x)< 0.1 
1x- 1 (19.16) 
x > 0, 

where x = [z1 T2 z3] and 
zı — 0.5 ui 0.1764 0.09702 O] [zı —0.5 
TE(x) = z2 — 0.5 0.9702 0.1089 O} |z2—0.5|. 
£3 0 0 0 £3 


This is essentially a two-asset portfolio optimization problem where the third 
asset represents the proportion of the funds that are not invested. The first two 
assets have standard deviations of 42% and 33% respectively and a correlation 
coefficient of 0.7. The “benchmark” is the portfolio that invests funds half-and- 
half in the two assets. The function TE(x) represents the tracking error of the 
portfolio with respect to the half-and-half benchmark and the first constraint 
indicates that this tracking error should not exceed 10%. The second constraint 
is the budget constraint; the third enforces no shorting. We depict the projection 
of the feasible set of this problem onto the space spanned by variables xı and x2 
in Figure 19.1. 

We now build a relative robustness model for this portfolio problem. We 
assume that the covariance matrix estimate is certain. We consider a very simple 
uncertainty set for the expected return estimates consisting of three scenarios 
represented with the three arrows in Figure 19.2. These three scenarios cor- 
respond to the following values for p: (6,4,0), (5,5,0), and (4,6,0). When 
p = (6,4,0) the optimal solution is (0.831,0.169,0) with objective value 5.662. 
Similarly, when u = (4,6,0) the optimal solution is (0.169, 0.831, 0) with objec- 
tive value 5.662. When p = (5,5,0) all points between the previous two optimal 


19.4 Some Robust Optimization Models in Finance 301 


0 02 04 06 08 1 


Figure 19.1 The feasible set of the mean-variance model (19.16) 


solutions are optimal with a shared objective value of 5.0. Therefore, the relative 
robust formulation for this problem can be written as follows: 


min t 
x,t 


s.t. 5.662 — (6a, +422) < t 
5.662 — (401 + 622) < t 
5 — (5x1 +5a2) < t (19.17) 
TE(x) < 0.1 
1'x=1 
x > 0. 


Instead of solving the problem where the optimal regret level is a variable (t 
in the formulation), an easier strategy is to choose a level of regret that can 
be tolerated and find portfolios that do not exceed this level of regret in any 
scenario. For example, choosing a maximum tolerable regret level of 0.75 we get 
the following feasibility problem: 


5.662 — (621 + 4a) < 0.75 
5.662 — (401 T 6x2) < 0.75 
5 — (541 + 5x2) < 0.75 
TE(x) <0.1 

1'x=1 

x > 0. 


This problem and its feasible set of solutions is illustrated in Figure 19.2. 
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Figure 19.2 Set of solutions with regret less than 0.75 for the mean-variance 
model (19.16) 


Notes 


Robust optimization was introduced by Ben-Tal and Nemirovski (1998, 2002) 
and independently by El Ghaoui and Lebret (1997) and El Ghaoui et al. (1998). 
The textbook by Ben-Tal et al. (2009) gives a thorough discussion on the subject, 
including an extensive list of references. 

Although robust optimization is widely popular in a variety of disciplines, it 
is not as widespread in financial optimization yet. There are strong supporters 
of its potential in finance (Ceria and Stubbs, 2006; Goldfarb and Iyengar, 2003; 
Tütüncü and Koenig, 2004). There are also some skeptics (Scherer, 2007). 


Exercises 


Exercise 19.1 Consider the optimization problem 


max pix — iy -x' Vx 


19.1 
1'x=1, ene) 


where V is a positive definite covariance matrix of asset returns, 4 is a vector 
of expected returns, and y > 0 is a risk-aversion constant. Assume V is certain 
but p is uncertain. 


(a) Let z(u) denote the optimal value of (19.18). Show that u > z(p) is a 
quadratic convex function. 

(b) Let U denote the uncertainty set for u. Formulate both the absolute and 
relative robust optimization versions of (19.18). 
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(c) Show that the absolute and relative robust optimization versions for the 
uncertainty sets U = {u}, ..., w*} and U = conv{p!,..., *} are equivalent. 


Exercise 19.2 (Robust least squares) Let P € R™*”", with q € R”, and 
consider the least-squares problem 


min ||Pv — qll2. (19.19) 


The purpose of this exercise is to compare the above usual least-squares problem 
with a robust version. 


(a) Suppose the matrix P can take any value in the ellipsoidal uncertainty set 
U = {P : ||P — Pll = p}, 


where ||P — P|| is either the operator norm or the Frobenius norm of the 
matrix P — P. 
Show that the robust version min min ||Pv — q||z of (19.19) is equivalent 
(= v 


to the problem 
min ||Pv — qll2 + piv. (19.20) 


(b) Set P, q as follows: 
> P= [1, O ; 1, 0.001 ; 10, -0.01] ; 
> q= [120 ; 
Use the MATLAB command 


>v= P\q 


to find the solution v* to (19.19) for the above values of P, q. 
What is the value of v* and the value of ||Pv* — q||2 that you found? 

(c) Now consider the matrix Q obtained by adding a small random perturbation 

to P 

> Q = P + 0.05*randn (3,2) 

What is the value of ||Qv* — q||2 for the solution v found in (a)? Repeat 

this a few (two or three) times. What do you observe? 
(d) Now use the robust formulation (19.20) to find a robust solution v* to (19.19) 
for p = 0.1. 

What is the value of v* and the value of ||Pv* — q||2 that you found? How 
different are they from the usual least-squares answers found in part (b)? 
Repeat part (c) for the robust solution v that you found in (d). How different 
is the behavior now? 


(e 


we 


Exercise 19.3 Consider the convex quadratic inequality 


x'(AA')x —2b'x+7<0, 
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where the parameters (A, b, y) belong to the uncertainty set 


k 
u = fiaa = (A°,b°, 7°) + 50 uj(A’, b, 7) : lulle < 7 


j=l 
for some fixed (AŻ, bf, yf), for j =0,1,...,k. 
Show that the (infinite) robust quadratic constraint 
x'(AA')x — 2b'x+7<0 forall (A,b,y) €U 
holds if and only if there exist zí, yf, for j =1,...,k, and \ such that 


Aix =z, j=0,1,...,k 
(bi)'x =y, j =0,1,...,k 


à>0 
PP-A aay) a 
y+5y M Z! | >0, 
z? Z I 
T T 
where y = [yt = yf], y= [t --- a] „and Z= |z! --- 2*). 


Conclude that the robust version of the optimization problem 
min cx 
x 
st. x! (AA')x—2b'x+7<0 


for the above kind of uncertainty set U can be written as a semidefinite program. 
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Nonlinear Programming: Theory 
and Algorithms 


It is sometimes necessary to consider more general nonlinear programs than the 
ones we have already studied: linear, quadratic, or conic programs. We give a 
brief introduction to this vast topic, and we discuss an application to estimating 
a volatility surface. 


Nonlinear Programming 


Consider a very general optimization problem of the form 


min f(x) 
st. gj(x)= bj, for j =1,...,m 
hi(x) < di, for i=1,...,p, 


or the equivalent more concise form 


min f(x) 
s.t. g(x)=b (20.1) 


where f,g;,h; : R” — R. In the special case when all functions f,g;,h; are 
linear, problem (20.1) is a linear program as discussed in Chapter 2. When some 
of the functions f, gj, h; are nonlinear, problem (20.1) is a nonlinear program. 
Many practical problems are naturally formulated as nonlinear programs. We 
already saw quadratic programming and conic programming in earlier chapters. 
However, the family of problems that can be formulated as nonlinear programs is 
enormous, as many, if not most, imaginable kinds of constraints and objectives 
can be cast in terms of nonlinear functions. (For some noteworthy examples, 
see the exercises at the end of the chapter.) The immense modeling power 
of nonlinear programming comes at a cost: unlike linear, quadratic, and conic 
programming, which have a solid theory and are solvable via a few algorithmic 
templates, both the theory and methods to solve general nonlinear programs 
are far more complicated. Different types of nonlinear programs, determined by 
structural properties of the objective and constraint functions, are amenable to 
different types of algorithms. The subsequent sections sketch the main theory 
and most popular algorithmic ideas. A comprehensive treatment of this vast 
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topic is beyond the scope of this textbook. We refer the reader to the excellent 
references Bertsekas (1999), Giiler (2010), and Nocedal and Wright (2006) for 
more details. 


Numerical Nonlinear Programming Solvers 


There are numerous software packages for solving nonlinear programs. The fol- 
lowing are some popular ones. We list them according to the class of algorithms 
discussed in Section 20.4) that they are based on: 


1) CONOPT, GRG2, Excel SOLVER. These solvers are based on the generalized 
reduced-gradient method. 

2) MATLAB optimization toolbox, SNOPT, NLPQL. These solvers are based on 
sequential quadratic programming. 

3) MINOS, LANCELOT. These solvers are based on an augmented Lagrangian 
approach. 

4) MOSEK, LOQO, IPOPT. These solvers are based on interior-point methods. 


Optimality Conditions 


In this section we consider the class of nonlinear programs (20.1) where the 
functions f,9;,h; are once or twice continuously differentiable. The optimality 
conditions for linear and convex quadratic programs extend to this more general 
context, albeit some new technicalities arise. In particular, for a general nonlinear 
program the theory described below applies to local minima. 

Let ¥ := {x € R” : g(x) = b, h(x) < d} denote the feasible set of (20.1). 
A point x* € X is a local minimum of (20.1) if there exists r > 0 such that 
f(x*) < f(x) for all x € B,(x*) O 4, where B,(x*) denotes the ball of radius r 
around x”; that is, 


r(x") := {x E€ R” : |x —x*|| < r}. 


A point x* € X is a strict local minimum of (20.1) if there exists r > 0 such that 
f(x*) < f(x) for all x € B,(x*)N X. 


Unconstrained Case 


For ease of exposition we first describe the optimality conditions for the simpler 
case without constraints. Consider the unconstrained optimization problem 


min f(x), (20.2) 


where f : R” > R. 
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Theorem 20.1 (First-order necessary conditions) Suppose f is continuously 
differentiable. If a point x* € R” is a local minimum of (20.2) then V f(x*) = 0. 


The above necessary conditions can be sharpened when the objective function 
is twice differentiable. 


Theorem 20.2 (Second-order necessary and sufficient conditions) Suppose f is 
twice continuously differentiable. 


(a) If a point x* € R” is a local minimum of (20.2) then Vf(x*) = 0 and 
WPF) 0: 

(b) If x* € R” is such that Vf(x*) = 0 and V? f(x*) > O then x* is a strict 
local minimum of (20.2). 


Constrained Case 


Consider now the general constrained problem (20.1). The optimality conditions 
for (20.1) rely on the following technical condition. 


Definition 20.3 Let x € X. Define I(x) := {i : hi(x) = dj}. The point x 
satisfies the linear independence constraint qualification if the set of gradient 
vectors 


{Vgj(x): 7 =1,...,m}U {VAi(x) : i € I(x)} 
is linearly independent. 


Theorem 20.4 (First-order necessary conditions) Suppose f,gi,h; are continu- 
ously differentiable. If a point x* € X is a local minimum of (20.1) and satisfies 
the linear independence constraint qualification, then there exist some Lagrange 
multipliers y E€ R” ands E€ R? such that 


VE) + Dia Vga) + Zia SiV Ai (*) = 0 
s>0 (20.3) 
si(hi(x*) —d;) =0, fori=1,...,p. 


Observe that the first block of equations in (20.3) can be written as 
VxL(x",y,s) = Vf(x") + Vg(x*)y + Vh(x*)s = 0, 
where L(x,y,s) is the following Lagrangian function for (20.1): 
L(x,y,8) = f(x) + y" (g(x) — b) +s" (h(x) - d), 
and where 
Velx) = [Vailx) = Vgm(x)] and Vh(x) = [Vhi(x) = Vhp(x)]. 


Observe the nice analogy to the first-order conditions for the unconstrained case. 
We next give second-order necessary and sufficient conditions. The precise 
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statements of the second-order conditions involve the following tangent subspace. 
Let x € X. The tangent subspace T(x) is defined as 


T(x) := {d € R” : Vg;(x)'d =0, j= 1,...,m, and Vh;(x)'d =0, i € I(x)}. 


Theorem 20.5 (Second-order necessary conditions) Suppose f is twice continu- 
ously differentiable. If a point x* € R” is a local minimum of (20.2) and satisfies 
the linear independence constraint qualification, then there exist y € R™ and 


s € R? such that (20.3) holds and 
dT (V2E(x",y,s))d > 0 
for alld € T(x*). 


Theorem 20.6 (Second-order sufficient conditions) Suppose f is twice contin- 
uously differentiable and x* € R” satisfies the linear independence constraint 
qualification. If there exist y € R™ ands E€ R? such that (20.3) holds as well as 
hi(x*) = d; > si > 0, and 


d™(V2L(x*, A, p))d > 0 


for all non-zero d € T(x*), then x* is a local minimum of (20.1). 


Convex Case 


As we detail next, the optimality conditions described above simplify and 
strengthen substantially when the underlying problem is convex. 


Proposition 20.7 Assume the objective function f in (20.2) is convex and 
differentiable. Then x* is an optimal solution to (20.2) if and only if Vf (x*) = 0. 


Proposition 20.8 Assume the objective function f in (20.1) is convex, the 
equality constraint functions g; are linear, and the inequality constraint functions 
hj are convex. Furthermore assume all f,gi,h; are differentiable. Then x* is an 
optimal solution to (20.1) if and only if there exist y € R™,s € R? such that 


VxL(x*,y,s) = 0, g(x*) = b, h(x*) < d, s > 0, s'(h(x*) — d) = 0. 


Algorithms 


Unconstrained Case 


We next describe three main algorithmic approaches to solving an unconstrained 
optimization problem of the form (20.2), namely the gradient descent method, 
Newton’s method, and the subgradient method. These methods in turn provide the 
foundation for the more elaborate methods for solving constrained optimization 
problems. 
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Gradient Descent 

Suppose the objective function f in (20.2) is differentiable. In this case a simple 
method for solving (20.2) is based on going downhill on the graph of the function 
f. The gradient gives the direction of fastest initial increase and thus its negative 
is the direction of fastest initial decrease. This can also be motivated by the first- 
order Taylor approximation of f around a point x: for p small we have 


f(x+p) © f(x) + VS(x)'P. 


Among all p of fixed norm, the one pointing in the direction —V f(x) minimizes 
the right-hand side. 
Algorithm 20.1 gives a formal description of the gradient descent method. 


Algorithm 20.1 Gradient descent method 
1: choose x° € R” 
2: for k = 0,1,... do 
3: choose a step length ap > 0 and set x*t! = x* — a, Vf(x*) 
4: end for 


The choice of step length a is a critical detail in the implementation of the 
gradient descent algorithm. If a is too large, the algorithm may fail to converge to 
a solution because the objective value could even increase after one iteration. On 
the other hand, if a is too small, the algorithm will be too slow. This issue applies 
not only to the gradient descent method but to any method that aims to move 
along a direction p. Suppose p € R” is a descent direction at the current point 
x*; that is, V f(x})"p < 0. A popular approach is to choose the step length large 
enough and perform backtracking; that is, shrink a, by a multiplicative constant 
smaller than one until the following sufficient decrease condition holds for some 
predetermined p € (0, 1): 


f(x* + arp) < f(x") + ak: p- VEE)" p. (20.4) 


The sufficient decrease requirement in (20.4) is called the Armijo-Goldstein 
condition. The first-order Taylor approximation for f around x ensures that 
(20.4) holds for sufficiently small a, provided f is differentiable and d is a 
descent direction. Algorithm 20.2 describes this kind of backtracking. This type 
of backtracking is also often called line search. 


Algorithm 20.2 Backtracking to select the step length a, 
1: choose a, > 0 and £, u € (0,1) 
2: while (20.4) fails do ay, = 8+ ax 
3: end while 
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Newton’s Method 

The gradient descent method uses only first-order information to choose the 
descent direction at each main iteration. There are several approaches to incor- 
porate additional information and speed up convergence. Newton’s method yields 
a substantially improved direction by incorporating second-order information. In 
its pure form each step of Netwon’s method for (20.2) updates a trial point x to 
the new point 


xt ox— V? f(x) VF (x). 


The latter update can be motivated by considering the second-order Taylor 
approximation to f around x: 


f+ p) = fx) + VF) p + pV? p. 


Observe that when V? f(x) > 0 the Newton step p := —V?f(x)~!V f(x) mini- 
mizes the right-hand side. 
Newton’s method also applies to solving nonlinear equations. Consider the 
system of nonlinear equations 
F(x)=0 (20.5) 


where F : R” > R” is a differentiable function. Let F’(x) denote the Jacobian 
matrix of F, that is, the n x n matrix with (i, j) component 


afix) 

f E a 

F (x)ij Ox; ’ 
where f1(x),...,fn(x) are the components of F(x). 


In its pure form, Newton’s method for (20.5) updates a trial point x to the 
new point 


x? =x— F'(x) t F(x). 


The latter update can be motivated by considering the first-order Taylor approx- 
imation to F around x: 


F(x +p) ~ F(x) + F’(x)p. 


The Newton step p = —F’(x)~1F(x) makes the above right-hand side equal to 
Zero. 

Observe that Newton’s method for the unconstrained optimization problem 
(20.2) is exactly the same as Newton’s method for solving the system of nonlinear 
equations V f(x) = 0. 

Newton’s method has a much faster rate of convergence than gradient descent 
provided the initial iterate is sufficiently close to the solution. On the other 
hand, when the initial iterate is far from the solution, the above pure form of 
Newton’s method may fail to converge. The latter drawback can be rectified by 
performing some backtracking along the Newton step direction as described in 
Algorithm 20.3. The step length a; can be chosen via the backtracking procedure 
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described in Algorithm 20.2 to ensure the Armijo—Goldstein sufficient decrease 
condition (20.4) holds. For the Newton direction d = —V? f(x) !V f(x"), a 
natural and customary initial step length at each step is a, = 1. 


Algorithm 20.3 Newton’s method with backtracking 


1: choose x? € R” 
2: for k = 0,1,... do 


3: choose a step length az € (0, 1] via backtracking and set xt! = x*— 
ak V? f (xt) V f(x") 
4: end for 


Subgradient Method 
In the special case when the objective function f is convex, the gradient descent 
method can be extended to non-smooth functions; that is, functions that are 
not necessarily differentiable. Non-smooth functions arise often in optimization. 
In particular, the Lagrangian relaxation heuristic for (8.9) described in Section 
8.3.3 yields the minimization of a non-smooth convex function. 

Let f : R” — R be a convex function. A point g € R” is a subgradient of f at 
x € R” if for all y € R” 


f(y) — f(x) > gly -x). 


The subdifferential of f at x, denoted Of (x), is the set of subgradients of f at x. 

The subdifferential of a convex function is non-empty at every point. The 
following example illustrates the subdifferential of a simple non-smooth function. 
Consider the convex function f : R > R defined by f(x) = |x|. In this case we 
have 


1 if «>0 
Of(z)=*<¢ -1 if «<0 
[-1,1] if =0. 


Algorithm 20.4 describes the subgradient method for (20.2) when f is a convex 
function. Observe that it is a natural extension of Algorithm 20.1. 


Algorithm 20.4 Subgradient method 
1: choose x° € R” 
2: for k = 0,1,... do 


3: choose gx € Of(x*) and a step length ap > 0, and set x+! = x*— 
Qk8Bk 
4: end for 


For non-smooth functions, the choice of step length a, for the subgradient 
method cannot be chosen via a backtracking procedure as the Armijo—Goldstein 
condition (20.4) cannot be guaranteed in the absence of differentiability. Various 


312 


Nonlinear Programming: Theory and Algorithms 


choices have been proposed in the literature. The following two generic types of 
step lengths are particularly simple and popular. The first one is to choose fixed 
sizes a, = a > 0 for all k. The second one is to choose slowly diminishing sizes 
such that 


OO Co 
> az < oœ, > Qk = OO. 


k=0 k=0 


Constrained Case 


Generalized Reduced Gradient 

The main idea behind the generalized reduced gradient method is to reduce a 
constrained problem to a sequence of unconstrained problems in a space of 
lower dimension. To illustrate this procedure, consider the special case when 
the equality constraints are linear: 


min f(x) 


20.6 
s.t. Ax=b ( ) 


for some A € R”*”. Without loss of generality we may assume that A has 
full row rank as otherwise either some constraints are redundant or the problem 
is infeasible. Since A has full rank, we can partition both A and x as follows: 


A = |Ag An] and x = E for some subset B C {1,...,n} such that Ag 
N 


is non-singular. Therefore 


Ax=b © Apxp+Anxyn=b xp = Ap (b — Anxn). 


Consequently, problem (20.6) is equivalent to the following reduced space uncon- 
strained minimization problem: 


min f(xy) 
XN 
where 
f(xy) = f(Ag'(b-— Anxy), xn). 
Consider a more general program with nonlinear equality constraints: 


min f(x) 


st. g(x) =b. ER 


We can extend the above approach by approximating the nonlinear equality 
constraints with their first-order Taylor approximation. More precisely, suppose 
the current point is x*. Consider the modification of (20.7) obtained by replacing 
g(x) = b with its first-order Taylor approximation 


min f(x) 


st. g(x*) + Ve(x*)"(x—x*) = b. (20.8) 
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Observe that the latter problem is of the form (20.6) and is thus amenable to 
the type of reduced space approach described above. Algorithm 20.5 describes a 
template for a generalized reduced gradient approach to problem (20.7). The step 
length a at each iteration is typically chosen to balance both goals of objective 
function reduction and constraint satisfaction. 


Algorithm 20.5 Generalized reduced gradient 
0 


1: choose x 
2: for k = 0,1,... do 


3: solve the linearized constraints problem (20.8) to find a search direction 
Ax* 

4: choose a step length a > 0 and set x*+! = x* + aAx* 

5: end for 


The generalized reduced gradient approach can be extended to deal with 
inequality constraints as well via an active-set approach like that discussed in 
Chapter 5. The basic idea is that the active inequalities can be treated as equality 
constraints. The challenge of course is to determine the correct set of active 
inequalities at the optimal solution. 


Sequential Quadratic Programming 

The central idea of sequential quadratic programming is to capitalize on algo- 
rithms for quadratic programming to solve more general nonlinear programming 
problems of the form (20.1). Given a current iterate x*, problem (20.1) can be 
approximated with the following quadratic program: 


min f(x") +Vf(x*)T(x—x*) + 3(x — xt) "B(x — x") 
s.t. g(x") + Ve(x*)"(x —x*) = b (20.9) 


where 


is the Hessian of the Lagrangian function with respect to the x variables and 
(y*,s*) is the current estimate of the vector of Lagrange multipliers. 

Algorithm 20.6 describes a template for a sequential quadratic programming 
approach to problem (20.7). Once again, the step length a at each iteration 
is typically chosen to balance both goals of objective function reduction and 
constraint satisfaction. 


Interior-Point Methods 

Interior-point methods, formerly discussed in Chapters 2 and 5, can be extended 
to general nonlinear programming under suitable differentiability conditions. The 
gist of the method is to solve the optimality conditions (20.3). 
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Algorithm 20.6 Sequential quadratic programming 


1: choose x?, y°, s? 
2: for k =0,1,... do 


3: solve the quadratic program (20.9) to find a search direction 
(Ax*, Ay*, As*) 

4: choose a step length a > 0 and set (x*t!, y*+1, sk+!) = (x* y*,s*)4 
a(Ax*, Ay*, As*) 

5: end for 


Similar to the linear and quadratic programming cases, interior-point methods 
generate a sequence of iterates that satisfy some inequalities strictly and each 
iteration of the algorithm aims to make progress towards satisfying the optimality 
conditions (20.3). The algorithm inevitably becomes a bit more elaborate for 
nonlinear programs because of the nonlinearities in the constraints. 

As before we use the following notational convention: given a vector s € R?, 
let S € RPP denote the diagonal matrix defined by Su = s;, for i = 1,...,n, 
and let 1 € R? denote the vector whose components are all 1s. The optimality 
conditions (20.3) can be restated as 


V f(x) + Vg(x)y + Vh(x)s 0 
g(x) —b 0 
= > 
h(x)+z-d oj? 222° 
SZ1 0 


Given u > 0, let (x(u), y(u), z(u),s(u)) be the solution to the following 
perturbed version of the above optimality conditions: 


V f(x) + Vg(x)y + Vi(x)s 0 
g(x) —b = 
hee ed al Bee s,z > 0. 
SZ1 pl 


The first condition above can be written as r,,(x,y,z,s) = O for the residual 
vector: 


Vif(x) + Vg(x)y + Vh(x)s 


. g(x) —b 
r (x,y, z,s) aa h(x) +z— d 
SZ1— pl 


The central path is the set {(x(u), y(u), z(u), s(u)) : u > 0}. Under suitable 
assumptions (x(u), y(u), z(u),s(ų&)) converges to a local optimal solution to 
(20.3). This suggests the following algorithmic strategy: Suppose (x, y,z,s) is 
“near” (x(u), y(u), Z(t), s(u)) for some u > 0. Use (x,y,z, 8) to move to a better 
point (xt,yt,zt,st) “near” (x(u™),y(ut), z(u™),s(u™)) for some pt < u. 

It can be shown that if a point (x,y,z,s) is on the central path, then the 
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corresponding value of u satisfies z's = pu. Likewise, given z,s > 0, define 


To move from a current point (x, y,z,s) to a new point, we use the Newton 
step for the nonlinear system of equations r,,(x, y,z,8) = 0; that is, 


(Ax, Ay, Az, As) = —r', (x,y,z, s)~'r,(x,y,Z,8). (20.10) 


Algorithm 20.7 presents a template for an interior-point method. 


Algorithm 20.7 Interior-point method for nonlinear programming 


1: choose x°, y? and z°,s° > 0 

2: for k = 0,1,... do 

3: solve the Newton system (20.10) for (x, y, z, s) = (x*,y*,z*,s*) and u := 
0.1u(z*,s*) 

4: choose a step length a € (0, 1] and set (x*t1,y 
(x£, y*,z* sk) + a(Ax, Ay, Az, As) 

5: end for 


k+1 ,k+1 ok+1) _ 
»Z 5 )= 


The step length a in step 4 should be chosen via a backtracking procedure 
so that z**1,s*t! > 0 and the size of r,(x*t!,y*t!,z**1,s**1) is sufficiently 
smaller than r,,(x*, y*,z*,s*). 


Estimating a Volatility Surface 


We conclude this chapter with a description of nonlinear programming to esti- 
mate the volatility surface. The discussion in this section is based on Coleman 
et al. (1999a,b). 

The Black—Scholes—Merton (BSM) equation for pricing European options is 
based on a geometric Brownian motion model for the movements of the underly- 
ing security. Namely, one assumes that the underlying security price S; at time 
t satisfies 


a = udt + odW,, (20.11) 

St 
where u is the drift, o is the (constant) volatility, and W; is the standard 
Brownian motion. Using this equation and some standard assumptions about 
the absence of frictions and arbitrage opportunities, one can derive the BSM 
partial differential equation for the value of a European option on this underlying 
security. Using the boundary conditions resulting from the payoff structure of the 
particular option, one determines the value function for the option. For example, 
for the European call and put options with strike K and maturity T, we obtain 
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the following formulas: 


C(K,T) = So®(d,) — Ke" ®(da), (20.12) 
P(K,T) = Ke~"’ ®(—dz) — Sp ®(—d), (20.13) 
where 
log(So/K) + (r + o? /2)T 
dı = , 
oVT 
dy = dı = oVT, 


and ®(-) is the cumulative distribution function for the standard normal distri- 
bution. In the formula r represents the continuously compounded risk-free and 
constant interest rate and o is the volatility of the underlying security that is 
assumed to be constant. 

The risk-free interest rate r, or a reasonably close approximation to it, is often 
available, for example from Treasury bill prices in US markets. Therefore, all one 
needs to determine the call or put price using these formulas is a reliable estimate 
of the volatility parameter ø. Conversely, given the market price for a particular 
European call or put, one can uniquely determine the implied volatility of the 
underlying security (implied by this option price) by solving the equations above 
with the unknown o. 

Empirical evidence against the appropriateness of (20.11) as a model for the 
movements of most securities is abundant. Most such studies refute the assump- 
tion of a volatility that does not depend on time or underlying price level. Indeed, 
studying the prices of options with the same maturity but different strikes, 
researchers observed that the implied volatilities for such options exhibited a 

“smile” structure, i.e., higher implied volatilities away from the money in both 
directions, decreasing to a minimum level as one approaches the at-the-money 
option from up or down. This is clearly in contrast with the constant (flat) 
implied volatilities one would expect had (20.11) been an appropriate model for 
the underlying price process. 

There are quite a few models that try to capture the volatility smile, including 
stochastic volatility models, jump diffusions, etc. Since these models introduce 
non-traded sources of risk, perfect replication via dynamic hedging as in the BSM 
approach becomes impossible and the pricing problem is more complicated. An 
alternative that is explored in Coleman et al. (1999b) is the one-factor continuous 
diffusion model: 


dS; 

= = p(S;,t)dt + o(S;,t)dW;, te [0,7], (20.14) 
t 

where the constant parameters u and ø of (20.11) are replaced by continuous 

and differentiable functions j4(S;,t) and o(S;,t) of the underlying price S, and 

time t. Here T denotes the end of the fixed time horizon. If the instantaneous 

risk-free interest rate r is assumed constant and the dividend rate is constant, 
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given a function o($,t), a European call option with maturity T and strike K 
has a unique price. Let us denote this price with C(o(S,t), K,T). 

While an explicit solution for the price function C(o(S,t), K,T) as in (20.12) 
is no longer possible, the resulting pricing problem can be solved efficiently via 
numerical techniques. Since u(S,t) does not appear in the generalized BSM 
partial differential equation, all one needs is the specification of the function 
o(S,t) and a good numerical scheme to determine the option prices in this 
generalized framework. 

So, how does one specify the function o($,t)? First of all, this function should 
be consistent with the observed prices of currently or recently traded options on 
the same underlying security. If we assume that we are given market prices of 
m call options with strikes Kj and maturities Tj in the form of bid-ask pairs 
(Bj aj) for j = 1,...,n, it would be reasonable to require that the volatility 
function o(S,t) is chosen so that 


To ensure that (20.15) is satisfied as closely as possible, one strategy is to 
minimize the violations of the inequalities in (20.15): 


cin, d lB; = COS.) KT)” + [COl t), Ky, Ti) — aj!" - (20.16) 
o(S,t)E 

; E 

Above, H denotes the space of measurable functions o(S,t) with domain R4 x 
[0,T] and [u]* = max{u,0}. Alternatively, using the closing prices C} for the 
options under consideration, or choosing the mid-market prices C; = (6; +a,;)/2, 
we can solve the following nonlinear least-squares problem: 


í 2 

Eien (C(o(S,t), Kj, Tj) — Cy. (20.17) 
This is a nonlinear least-squares problem since the function C(o(S,t), K;,T;) 
depends nonlinearly on the variables, namely the local volatility function o(S, t). 
While the calibration of the local volatility function to the observed prices 
using the objective functions in (20.16) and (20.17) is important and desirable, 
there are additional properties that are desirable in the local volatility function. 
The most common feature sought in existing models is regularity or smoothness. 
For example, in Lagnado and Osher (1997) the authors try to achieve a smooth 

volatility function by modifying the objective function in (20.17) as follows: 


n 


min (C(o(S, t), Kj, T) — Cj)” + AllVo(S, t)|l2. (20.18) 
a(S,thEH jal 
Here, A is a positive tradeoff parameter and || - ||2 represents the L?-norm in H. 


Large deviations in the volatility function would result in a high value for the 
norm of the gradient function, and by penalizing such occurences, the formulation 
above encourages a smoother solution to the problem. The most appropriate 
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value for the tradeoff parameter À must be determined experimentally. To solve 
the resulting problem numerically, one must discretize the volatility function on 
the underlying price and time grid. Even for a relatively coarse discretization of 
the S and t spaces, one can easily end up with an optimization problem with 
many variables. 

An alternative strategy is to build the smoothness into the volatility function 
by modeling it with spline functions. The use of the spline functions not only 
guarantees the smoothness of the resulting volatility function estimates but 
also reduces the degrees of freedom in the problem. As a consequence, the 
optimization problem to be solved has many fewer variables and is easier. This 
strategy is proposed in Coleman et al. (1999b) and we review it below. 

We start by assuming that o(S,t) is a bi-cubic spline. While higher-order 
splines can also be used, cubic splines often offer a good balance between flex- 
ibility and complexity. Next we choose a set of spline knots at points (.9;, ¢;) 
for i = 1,...,k. If the value of the volatility function at these points is given 
by a; := a(S}, tj), the interpolating cubic spline that goes through these knots 
and satisfies a particular end condition (such as the natural spline end condition 
of linearity at the boundary knots) is uniquely determined. In other words, to 
completely determine the volatility function as a natural bi-cubic spline (and 
therefore to determine the resulting call option prices) we have k degrees of 
freedom represented with the choices © = (G1,..., 0k). Let H(S,t,@) be the 
bi-cubic spline local volatility function obtained by setting o(S$;,t;) := oj. Let 
C(X(S,t, a), S,t) denote the resulting call price function. Then the analog of the 
objective function (20.17) is 


min 5 (CŒ(S,t, 5), Kj, Tj) — C)’. (20.19) 


One can introduce positive weights wj for each of the terms in the objective 
function above to address different accuracies or confidence in the call prices 
Cj. One can also introduce lower and upper bounds l; and u; for the volatilities 
at each knot to incorporate additional information that may be available from 
historical data, etc. This way, we form the following nonlinear least-squares 
problem with k variables: 


ii 


min 


ir = X w; (C(X(S,t, 6), Kj, Tj) — C5)” (20.20) 
oER* 
j=l 


st. l<ao<u. 


It should be noted that the formulation above will not be appropriate if there 
are many more knots than prices, that is, if k is much larger than n. In this case, 
the problem will be underdetermined and solutions may exhibit “overfitting”. 
There should be fewer knots than available option prices. 

The problem (20.20) is a standard nonlinear optimization problem except that 
the objective function f(a) and in particular the function C(X(S,t,o), K;,T7;) 
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depends on the decision variables o in a complicated and non-explicit manner. 
Since most of the nonlinear optimization methods we discussed in the previous 
section require at least the gradient of the objective function (and sometimes its 
Hessian matrix as well), this may sound alarming. Without an explicit expression 
for f, its gradient must be estimated either using a finite difference scheme or 
using automatic differentiation. Coleman et al. (1999b) implement both alterna- 
tives and report that local volatility functions can be estimated very accurately 
using these strategies. They also test the hedging accuracy of different delta- 
hedging strategies, one using a constant volatility estimation and another using 
the local volatility function produced by the strategy above. These tests indicate 
that the hedges obtained from the local volatility function are significantly more 
accurate. 


Exercises 


Exercise 20.1 Suppose f : R” — R is a differentiable function at the point 
x € R”. Consider the first-order Taylor approximation to f around x: 


F(p) = f(x) + VF(x)"P 
Show that if V f(x) # 0 then the solution to the problem 


pain, FP) 
is 
oe 1 
m= VEO 


In other words, it is the unitary vector in the direction of negative gradient. 
Exercise 20.2 


(a) Let A € R™*”, b € R”, c € R”. Show that the mixed binary program 


min c'x 


s.t. Ax<b 
zj E {0,1}, j EJ 


is equivalent to the nonlinear program 


min c!x 


s.t. Ax<b 
z;(1— zi) =0, JEJ. 


(b) Let A € R™*", b € R™, c € R”. Show that the mixed integer program 


min c'x 


s.t. Ax<b 
szjEZ, j EJ 
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is equivalent to the nonlinear program 
min c'x 
st. Ax<b 


sin(mz,;) =0, j € J. 


Exercise 20.3 Let n be a positive integer. Show that for suitable differentiable 
functions f,g,h the statement 


“There exist x,y,z € Z all different such that z” + y” = z”.” 
can be equivalently stated as 


“The optimal value of 


min f(x) 
s.t. g(x) <0 
h(x) =0 


is zero.” 


What does that suggest about the difficulty of solving generic nonlinear pro- 
gramming problems? 


Appendices 


Appendix Basic Mathematical Facts 


A.1 Matrices and Vectors 


For two positive integers m and n, let R™X” denote the space of m x n matrices 
with real entries. The transpose of an m x n matrix 


a11 a12 bry Ain 
A a21 A22 yak An Rmxn 
= $ $ í E 
Ami Am2 BNR Amn 
is the n x m matrix 
Q11 a21 evs Am1 
AT a12 a22 eee Am2 Rex 
=), ; i € . 
Ain A2n sirg Amn 


A square matrix A € R”*” is symmetric if AT = A. 
The product of two matrices A = (aip) E€ R”™*” and B = (b,;) € R"”? is the 
matrix C = AB = (cij) € R™*? defined componentwise as follows: 


n 
Cij = y Qikbkj, A E a a a EEN A 
k=1 


Observe that the matrix product AB is well defined if the number of columns 
of A and the number of rows of B match. 

The identity matrix I € R"*” is the matrix with components equal to 1 on the 
diagonal and all other components equal to 0. Observe that for all A € R™*”, 
B € R”*P we have AI = A and IB = B. If A,B € R”*” and AB = BA = I, 
then we say that B is the inverse of A and write B = A~}. 

The following kinds of matrix-vector products arise often. Suppose 


Cl qii 912 °*** in Tı 
C2 qi2 922 `° 2n T2 


Cn din Qn ""° dnn Tn 
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Then 
Tı 
c'x= [er ee Cn] © | = Cyr, +-++ t Cnty 
Tn 
and 
Mi >? Gin} |T M1%1 T tt T intn 
Qx = = 
din ia ann T3 Qinti T° T nnn 
So 
qiti Fte + Ginin 
x'Qx = [z1 vee Fn) : 


dinti Stee oe Annn 


n n 
= X X QijLitj 
i=1 j=1 


= qux? +- + dnnt, + 2q127122 + 2q23%273 + `- + 2n—1,n@n-18n- 


A symmetric matrix M € R”*” is positive semidefinite if x'Mx > 0 for all 
x € R” and it is positive definite if it satisfies the stronger condition x! Mx > 0 
for all non-zero x € R”. 


Convex Sets and Convex Functions 


A set S C R” is convex if for all x,y € S the straight segment joining x and y 
is contained in S; that is, 


[x,y] := {Ax + (1 — à)y : à € [0, 1} CS. 
The following are types of convex sets that appear often in optimization models. 
It is easy to verify that they are indeed convex sets. 
Half-space: Given a non-zero a € R” and b € R the half-space 
{xe R” :a'x<b} 
is convex. 


Intersections of convex sets: Given a collection S; C R”, for i € I, of convex 
sets, their intersection (ke z Si is a convex set. 
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Affine images and preimages: Given a convex set S C R”, matrices A € R™*”, 
B € R”*? and vectors a € R”, b € R”, the sets 


A(S)+a={Ax+a:x€S}CR”™ 
and 
B-'(S +b) :={v € R? : Bv— be $} CR? 
are convex. 


Suppose S C R” is a convex set. A function f : S — R is conver if, for all 
x,y € Sand à € [0,1], 


FAX + (1 = Ay) < AFC) + 0- A) fY). 


A common way of dealing with the domain of a function is to consider extended 
valued functions; that is, functions defined on the whole space R” and allowed to 
take the value oo. The domain of an extended valued function f : R” > RU {oo} 
is the set 


dom(f) := {x € R” : f(x) < co}. 


An alternative and equivalent definition of convexity is the following. An 
extended valued function f : R” + RU {oo} is convex if the set 


epigraph(f) := {(x,t) ER" : f(x) < t} 


is convex. Observe that if f is convex, then its domain is a convex set. Further- 
more, if f is convex then for all £ € R the sublevel set {x € R” : f(x) < 4} isa 
convex set. 

The following relationship between differentiability and convexity is particu- 
larly useful to verify that functions are convex. 


Theorem A.1 Suppose f : S — R is twice differentiable on the open set 
S CR” andC CS is a conver set. Then f is convex on C if and only if V? f(x) 
is positive semidefinite for all x € C. 


As an immediate consequence of Theorem A.1 it follows that every affine 
function f(x) =c'x +b is convex. It also follows that a quadratic function 


f(x) = 5x Ox +e'x+b 


is convex if and only if Q is positive semidefinite. 


Calculus of Variations: the Euler Equation 


The calculus of variations is the analog of calculus that works with function- 
als rather than functions. Functionals are often integrals of functions. Many 
problems in the calculus of variations arose from the need to find a function 
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that optimizes a given functional. The Euler equation for the minimization of 
a functional subject to boundary conditions is a kind of first-order optimality 
condition for the problem 


x 


T 
min f L(t, x(t), e(t))dt, x(0) = zo, £(T) = zr. 
0 
The optimal solution z*(t) must satisfy the differential equation 
— Li. (A.1) 


Equation (A.1) is called the Euler equation. For a derivation of this optimality 
condition as well as a detailed discussion on the interesting subject of calculus 
of variations, see Fleming and Rishel (1975). 


References 


Alizadeh F. (1991). Combinatorial Optimization with Interior Point Methods and 
Semi-definite Matrices. PhD thesis, University of Minnesota. 

Almgren R. and N. Chriss (2000). Optimal execution of portfolio transactions. 
Journal of Risk, 3:5-39. 

Almgren R., C. Thum, E. Hauptmann, and H. Li (2005). Direct estimation of 
equity market impact. Risk, 18:58-62. 

Andersson F., H. Mausser, D. Rosen, and S. Uryasev (2001). Credit risk opti- 
mization with conditional value-at-risk criterion. Mathematical Programming, 
89:273-291. 

Artzner P., F. Delbaen, J. Eber, and D. Heath (1999). Coherent measures of risk. 
Mathematical Finance, 9:203-228. 

Back K. (2010). Asset Pricing and Portfolio Choice Theory. Oxford University 
Press. 

Basel Committee on Banking Supervision (2011). Basel III: A Global Regulatory 
Framework for More Resilient Banks and Banking Systems. Technical Report, 
Bank for International Settlements. 

Bawa V.S., S.J. Brown, and R.W. Klein (1979). Estimation Risk and Optimal 
Portfolio Choice. North-Holland. 

Bellman R. (1954). The theory of dynamic programming. Bulletin of the 
American Mathematical Society, 60:503-515. 

Bellman R. (1957). Dynamic Programming. Princeton University Press. 

Ben-Tal A. and A. Nemirovski (1998). Robust convex optimization. Mathematics 
of Operations Research, 23(4):769-805. 

Ben-Tal A. and A. Nemirovski (2002). Robust optimization — methodology and 
applications. Mathematical Programming, 92(3):453-480. 

Ben-Tal A., L. El Ghaoui, and A. Nemirovski (2009). Robust Optimization. 
Princeton University Press. 

Bertsekas D. (1999). Nonlinear Programming. Athena Scientific. 

Bertsekas D. (2005). Dynamic Programming and Optimal Control. Athena 
Scientific. 

Bertsimas D. and A. Lo (1998). Optimal control of execution costs. Journal of 
Financial Markets, 1:1-50. 

Bertsimas D. and J. Tsitsiklis (1997). Introduction to Linear Optimization. 
Athena Scientific. 


References 


Bertsimas D., V. Gupta, and I.Ch. Paschalidis (2012). Inverse optimization: a 
new perspective on the Black—Litterman model. Operations Research, 1389- 
1403. 

Birge J. and F. Louveaux (1997). Introduction to Stochastic Programming. 
Springer. 

Black F. and R. Litterman (1992). Global portfolio optimization. Financial 
Analysts Journal, 48:28-43. 

Black F. and M. Scholes (1973). The pricing of options and corporate liabilities. 
Journal of Political Economy, 81:637-659. 

Blume M. (1975). Betas and the regression tendencies. Journal of Finance, 
30:785-795. 

Boyd S. and L. Vandenberghe (2004). Convex Optimization. Cambridge Univer- 
sity Press. 

Brinson G., B. Singer, and G. Beebower (1991). Determinants of portfolio 
performance. Financial Analysts Journal, 47:40-48. 

Broadie M. (1993). Computing efficient frontiers using estimated parameters. 
Annals of Operations Research, 45(1):21-58. 

Campbell J., A. Lo, and A. MacKinlay (1997). The Econometrics of Financial 
Markets. Princeton University Press. 

Carino D., T. Kent, D. Myers, C. Stacy, M. Sylvanus, A. Turner, K. Watanabe, 
and W. Ziemba (1994). The Russell-Yasuda Kasai model: an asset/liability 
model for a Japanese insurance company using multistage stochastic program- 
ming. Interfaces, 24(1):29-49. 

Ceria S. and R. Stubbs (2006). Incorporating estimation errors into portfolio 
selection: robust portfolio selection. Journal of Asset Management, 7:109-127. 

Choueifaty Y. and Y. Coignard (2008). Toward maximum diversification. Journal 
of Portfolio Management, 40-51. 

Chvátal V. (1983). Linear Programming. W.H. Freeman. 

Coleman T.F., Y. Kim, Y. Li, and A. Verma (1999a). Dynamic Hedging in a 
Volatile Market. Technical Report, Cornell Theory Center. 

Coleman T.F., Y. Kim, Y. Li, and A. Verma (1999b). Reconstructing the 
unknown volatility function. Journal of Computational Finance, 2:77-102. 
Conforti M., G. Cornuéjols, and G. Zambelli (2014). Integer Programming. 

Springer. 

Connor G. (1995). The three types of factor models: a comparison of their 
explanatory power. Financial Analysts Journal, 51:42—46. 

Constantinides G. (1983). Capital market equilibrium with personal tax. Econo- 
metrica, 51:611-636. 

Constantinides G. (1984). Optimal stock trading with personal taxes: impli- 
cations for prices and the abnormal January returns. Journal of Financial 
Economics, 13:65-89. 

Cornuéjols G., M. Fisher, and G. Nemhauser (1977). Location of bank accounts 
to minimize float: an analytical study of exact and approximate algorithms. 
Management Science, 23:229-263. 


References 329 


Cox J., S. Ross, and M. Rubinstein (1979). Option pricing: a simplified approach. 
Journal of Financial Economics, 7:229-263. 

Dammon R., C. Spatt, and H. Zhang (2001). Optimal consumption and 
investment with capital gains taxes. Review of Financial. Studies, 14:583-616. 

Dammon R., C. Spatt, and H. Zhang (2004). Optimal asset location and alloca- 
tion with taxable and tax-deferred investing. Journal of Finance, 59:999-1038. 

Dantzig G. (1963). Linear Programming and Extensions. Princeton University 
Press. 

Dantzig G. (1990). The diet problem. Interfaces, 20(4):43-47. 

Dantzig G., R. Fulkerson, and S. Johnson (1954). Solution of a large-scale 
traveling-salesman problem. Operations Research, 2:393-410. 

Davarnia D. and G. Cornuéjols (2017). From estimation to optimization via 
shrinkage. Operations Research Letters, 45:642-646. 

De Vries S. and R. Vohra (2003). Combinatorial auctions: a survey. INFORMS 
Journal on Computing, 15(3):284-309. 

Duffie D. (2001). Dynamic Asset Pricing Theory. Princeton University Press. 

Efron B. and C. Morris (1977). Stein’s paradox in statistics. Scientific American, 
236:119-127. 

El Ghaoui L. and H. Lebret (1997). Robust solutions to least-squares problems 
with uncertain data. SIAM Journal on Matrix Analysis and Applications, 
18(4):1035-1064. 

El Ghaoui L., F. Oustry, and H. Lebret (1998). Robust solutions to uncertain 
semidefinite programs. SIAM Journal on Optimization, 9(1):33-52. 

Engle R. (1982). Autoregressive conditional heteroscedasticity with estimates of 
the variance of United Kingdom inflation. Econometrica, 50:987—1007. 

Fabozzi F. (2004). Bonds, Markets, Analysis and Strategies, fifth edition. 
Prentice-Hall. 

Fabozzi F., P. Kolm, D. Pachamanova, and S. Focardi (2007). Robust Portfolio 
Optimization and Management. Wiley. 

Fama E. and K. French (1992). The cross-section of expected stock returns. 
Journal of Finance, 67:427—465. 

Fleming W. and R. Rishel (1975). Deterministic and Stochastic Optimal Control. 
Springer. 

Friedman J., T. Hastie, and R. Tibshirani (2001). The Elements of Statistical 
Learning, volume 1. Springer. 

Gârleanu N. and L. Pedersen (2013). Dynamic trading with predictable returns 
and transaction costs. Journal of Finance, 68(6):2309-2340. 

Goldfarb D. and G. Iyengar (2003). Robust portfolio selection problems. 
Mathematics of Operations Research, 28(1):1-38. 

Gomory R.E. (1958). Outline of an algorithm for integer solutions to linear 
programs. Bulletin of the American Mathematical Society, 64:275—-278. 

Gomory R.E. (1960). An Algorithm for the Mixed Integer Problem. Technical 
Report RM-2597, The Rand Corporation. 


330 


References 


Gondzio J. and R. Kouwenberg (2001). High performance for asset liability 
management. Operations Research, 49:879-891. 

Grinold R. and R. Kahn (1999). Active Portfolio Management: A Quantitative 
Approach for Producing Superior Returns and Controlling Risk, second edition. 
McGraw-Hill. 

Giiler O. (2010). Foundations of Optimization. Springer. 

Halldórsson B. and R. Tütüncü (2003). An interior-point method for a class 
of saddle-point problems. Journal of Optimization Theory and Applications, 
116(3):559-590. 

Harrison J. and D. Kreps (1979). Martingales and arbitrage in multiperiod 
security markets. Journal of Economic Theory, 20:381-408. 

Harrison J. and S. Pliska (1981). Martingales and stochastic integrals in the 
theory of continuous trading. Stochastic Processes and their Applications, 
11:215-260. 

Heath D., R. Jarrow, and A. Morton (1992). Bond pricing and the term 
structure of interest rates: a new methodology for contingent claims 
valuation. Econometrica, 60:77-105. 

Herzel S. (2005). Arbitrage opportunities on derivatives: a linear programming 
approach. Dynamics of Continuous, Discrete and Impulsive Systems. Series B: 
Applications and Algorithms, 12:589-606. 

Hodges S. and S. Schaefer (1977). A model for bond portfolio improvement. 
Journal of Financial and Quantitative Analysis, 12:243-260. 

Hoyland K. and S.W. Wallace (2001). Generating scenario trees for multistage 
decision problems. Management Science, 47:295-307. 

Jorion P. (1986). Bayes-Stein estimation for portfolio analysis. Journal of 
Financial and Quantitative Analysis, 21:279-292. 

Jorion P. (1992). Portfolio optimization in practice. Financial Analysts Journal, 
48:68-74. 

Jorion P. (2003). Portfolio optimization with tracking-error constraints. 
Financial Analysts Journal, 59:70-82. 

Karmarkar N. (1984). A new polynomial time algorithm for linear programming. 
Combinatorica, 4:373-395. 

Kelly J.L. (1956). A new interpretation of information rate. Bell System 
Technical Journal, 35:917—926. 

Klaassen P. (2002). Comment on “Generating scenario trees for multistage 
decision problems”. Management Science, 48:1512-1516. 

Kocuk B. and G. Cornuéjols (2017). Incorporating Black—Litterman Views 
in Portfolio Construction When Stock Returns Are a Mixture of Normals. 
Technical Report, Carnegie—Mellon University, Pittsburgh. 

Konno H. and H. Yamazaki (1991). Mean-absolute deviation portfolio optimiza- 


tion model and its applications to Tokyo stock market. Management Science, 
37(5):519-531. 


References 331 


Kouwenberg R. (2001). Scenario generation and stochastic programming models 
for asset liability management. European Journal of Operational Research, 
134:279-292. 

Kritzman M. (2002). Puzzles of Finance: Six Practical Problems and their 
Remarkable Solutions. Wiley. 

Lagnado R. and S. Osher (1997). Reconciling differences. Risk, 10:79-83. 

Land A.H. and A.G. Doig (1960). An automatic method of solving discrete 
programming problems. Econometrica, 28:497—520. 

Ledoit O. and M. Wolf (2003). Improved estimation of the covariance matrix of 
stock returns with an application to portfolio selection. Journal of Empirical 
Finance, 10:602-621. 

Ledoit O. and M. Wolf (2004). A well-conditioned estimator for large-dimensional 
covariance matrices. Journal of Multivariate Analysis, 88:365-411. 

Lintner J. (1965). The valuation of risk assets and the selection of risky 
investments in stock portfolios and capital budgets. Review of Economics and 
Statistics, 47:13-37. 

Litterman B. (2003). Modern Investment Management: An Equilibrium 
Approach. Wiley. 

Markowitz H. (1952). Portfolio selection. Journal of Finance, 7:77-91. 

Merton R. (1973). Theory of rational option pricing. Bell Journal of Economics 

and Management Science, 4:141-183. 

Meucci A. (2005). Risk and Asset Allocation. Springer. 

Meucci A. (2010). Return calculations for leveraged securities and portfolios. 

GARP Risk Professional, October:40—43. 

Michaud R. and R. Michaud (2008). Efficient Asset Management. Oxford 

University Press. 

Mossin J. (1966). Equilibrium in a capital asset market. Econometrica, 

34:768-783. 

Nesterov Y. (2004). Introductory Lectures on Convex Optimization: A Basic 

Course. Kluwer Academic. 

Nesterov Y. and A. Nemirovskii (1994). Interior-Point Polynomial Algorithms 

in Convex Programming. SIAM. 

Nesterov Y. and M. Todd (1997). Self-scaled barriers and interior-point methods 

for convex programming. Mathematics of Operations Research, 22:1—42. 

Nesterov Y. and M. Todd (1998). Primal-dual interior-point methods for 

self-scaled cones. SIAM Journal on Optimization, 8:324-364. 

Nocedal J. and S. Wright (2006). Numerical Optimization. Springer. 

Padberg M. and G. Rinaldi (1987). Optimization of a 532-city symmetric travel- 
ing salesman problem by branch and cut. Operations Research Letters, 6:1-7. 


Pérold A. (1988). The implementation shortfall: paper versus reality. Journal of 
Portfolio Management, 14:4-9. 

Pokutta S. and C. Schmaltz (2012). Optimal bank planning under Basel III 
regulations. Capco Institute Journal of Financial Transformation, 34:165-174. 


References 


Porteus E. (2002). Foundations of Stochastic Inventory Theory. Stanford Uni- 
versity Press. 

Poundstone W. (2005). Fortune’s Formula: The Untold Story of the Scientific 
Betting System that Beat the Casinos and Wall Street. Hill and Wang. 

Ragsdale C. (2007). Spreadsheet Modeling & Decision Analysis: A Practical 
Introduction to Management Science, fifth edition. Thomson South-Western. 

Renegar J. (2001). A Mathematical View of Interior-Point Methods in Convex 
Optimization. SIAM. 

Rockafellar T. and S. Uryasev (2000). Optimization of conditional value-at-risk. 
Journal of Risk, 2:21-41. 

Ronn E. I. (1987). A new linear programming approach to bond portfolio 
management. Journal of Financial and Quantitative Analysis, 22:439-466. 
Roos C., T. Terlaky, and J.-Ph. Vial (2005). Interior Point Methods for Linear 

Optimization, second edition. Springer. 

Rosenberg B. (1974). Extra-market components of covariance in security returns. 
Journal of Financial and Quantitative Analysis, 9(2):263-274. 

Ross S. (1976). The arbitrage theory of capital asset pricing. Journal of 
Economic. Theory, 13:341-360. 

Rustem B. and M. Howe (2002). Algorithms for Worst-Case Design and 
Applications to Risk Management. Princeton University Press. 

Schaefer S.M. (1982). Tax induced clientele effects in the market for British 
government securities. Journal of Financial Economics, 10:121-159. 

Scherer B. (2002). Portfolio resampling: review and critique. Financial Analysts 
Journal, 58:98-109. 

Scherer B. (2007). Can robust portfolio optimization help to build better 
portfolios? Journal of Asset Management, 7:374-387. 

Schmieta S. and F. Alizadeh (2001). Associative and Jordan algebras, and 
polynomial time interior-point algorithms for symmetric cones. Mathematics 
of Operations Research, 26(3):543-564. 

Schmieta S. and F. Alizadeh (2003). Extension of primal—dual interior point 
algorithms to symmetric cones. Mathematical Programming, 96(3):409-438. 
Shapiro A., D. Dentcheva, and A. Ruszczynski (2009). Lectures on Stochastic 

Programming: Modeling and Theory. SIAM. 

Sharpe W. (1964). Capital asset prices: a theory of market equilibrium under 
conditions of risk. Journal of Finance, 19(3):425-442. 

Sharpe W. (1992). Asset allocation: management style and performance mea- 
surement. Journal of Portfolio Management, 18(2):7-19. 

Shleifer A. (2000). Inefficient Markets. Oxford University Press. 

Shreve S. (2000). Stochastic Calculus for Finance, volumes I and II. Springer. 

Sra S., S. Nowozin, and S. Wright (2012). Optimization for Machine Learning. 
MIT Press. 

Stein C. (1956). Inadmissibility of the usual estimator for the mean of multivari- 
ate normal distribution. In Proceedings of the Third Berkeley Symposium on 
Mathematical Statistics and Probability, pp. 197—206. 


References 333 


Sturm J. (1999). Using SeDuMi 1.02, a Matlab toolbox for optimization over 
symmetric cones. Optimization Methods and Software, 11:625-653. 

Tibshirani R. (1996). Regression shrinkage and selection via the lasso. Journal 
of the Royal Statistical Society, 58:267—288. 

Tobin J. (1958). Liquidity preference as behavior towards risk. Review of 
Economic Studies, 25(2):65-86. 

Toh K., M. Todd, and R. Tütüncü (1999). SDPT3 —- a MATLAB software 
package for semidefinite programming. Optimization Methods and Software, 
11:545-581. 

Tuckman B. (2002). Fixed Income Securities: Tools for Today’s Markets. Wiley. 

Tiitiincti R. and M. Koenig (2004). Robust asset allocation. Annals of Operations 
Research, 132:157-187. 

Vapnik V. (2013). The Nature of Statistical Learning Theory. Springer. 

Werner R. (2010). Costs and Benefits of Robust Optimization. Technical Report, 
Technical University of München. 

Ye Y. (1997). Interior-Point Algorithms: Theory and Analysis. Wiley. 

Zhao Y. and W.T. Ziemba (2001). A stochastic programming model using an 
endogenously determined worst case risk measure for dynamic asset allocation. 
Mathematical Programming, 89(2):293-309. 


Index 


accrued interest, 42 
active constraint, see binding constraint 
active return, 105 
active risk, 105 
active set, 313 
active-set methods, 81 
adaptive decision, 7 
adjoint, 285 
Almgren—Chriss model, 201 
alpha 

Jensen, 113 

t-statistic, 113 
alpha of a security, 104 
APT, 111 
arbitrage, 55 
arbitrage pricing theory, see APT 
Armijo—Goldstein condition, 309 
asset allocation, 95 
asset—liability management, 262 
auction 

combinatorial, 161 
autoregressive model, 256 


backtracking, 309 
Basel III, 15 
basic feasible solution, 25 
basis, 25 
optimal, 25 


Bellman’s optimality principle, 215, 216, 221 


Benders decomposition, 255 
Benders decomposition method, 177 
bequest, 227 
beta 

long-short threshold, 119 
beta of a security, 104 
bid, 161 
bid—ask spread, 66 
binary program, 140 
binding constraint, 3 
binomial lattice, 238 
binomial lattice model, 238 
binomial pricing model, 56 
Black—Litterman model, 126 
Black—Scholes—Merton equation, 245 


Black—Scholes—Merton option pricing 
formula, 316 

blocking constraint, 82 
bond 

clean price, 42 

coupon rate, 42 

dirty price, 42 

maturity date, 42 

term to maturity, 42 

yield, 42 
bond allocation, 12 
bond portfolio 

dedicated, 35 
boostrapping, 131 
branch-and-bound method, 150, 151 
branch-and-bound tree, 152 
branch-and-cut method, 150 
branching, 151 
Brownian motion, 315 
BSM formula, 316 
bundle, 161 


capital allocation line, 98 
capital asset pricing model, see CAPM 
CAPM, 98, 108, 111, 112 
captured value, 202 
cash flow problems, 44 
central path, 29, 83 
clientele effects, 63 
clustering, 142 
combinatorial auction problem, 161 
complementary slackness conditions, 24 
conditional value at risk, see CVaR 
cone 
second-order, 277 
symmetric, 287 
conic program, 277 
dual, 282 
primal, 282 
constraint 
turnover, 101 
constraint set, 3 
consumption, 174 
contingent claim, 55 


convex function, 325 

convex optimization, 4 

convex set, 324 

cutting plane, 154 
cutting-plane method, 150, 154 
CVaR, 181, 184 


decision variables, 3 

descent direction, 309 

deterministic model, 4 

diffusion model, 316 

dispersion measure, 181 

diversification, 134 
maximum, 135 

drift, 244 

dual cone, 282 

dual problem, 21 


efficient frontier, 93, 124 

estimator 
inadmissable, 129 
James-Stein shrinkage, 129 
risk, 129 

Euler equation, 206 

event tree, 250 

excess return, 102 


factor exposure, see factor loading 
factor loading, 107 
factor model, 106 
factor portfolio, 120 
Farkas’s lemma, 21, 34 
feasibility cut, 178 
feasible point, 3 
feasible region, 3 
feasible solution, see feasible point 
Fisher—Weil convexity, 40 
Fisher—Weil dollar convexity, 40 
Fisher—Weil dollar duration, 39 
Frobenius inner product, 281 
fund allocation 

linear programming model, 12 
fundamental theorem of asset pricing, 56 


geometric Brownian motion model, 244 
Gomory mixed integer cut, 155 
Gordan’s theorem, 22, 34 

gradient descent method, 309 


homogenization, 102 
hyperplane separation theorem, 34 


ice-cream cone, see cone, second-order 

immunization, 39 

immunized portfolio, 39 

implementation shortfall, see total cost of 
trading 

implied volatility, 273 

index fund, 165 

infeasible problem, 3 

information ratio, 105 


Index 335 


insurance company ALM problem, 263 
interior-point method, 28 
infeasibility, 30 
nonlinear programming, 313 
quadratic program, 83 


Jacobian, 310 
Jensen’s alpha, 112 


Kelly criterion, 197 


L-shaped method, 177 

Lagrange multiplier, 75 

Lagrangian dual, 147 

Lagrangian function, 22, 77 

asso regression, 87 

ine search, 309 

ine-search procedure, 30 

inear independence constraint qualification, 

307 
inear optimization model, see linear 
programming model 

inear program, 11 

inear programming, 5 

inear programming model, 11 
non-degenerate, 19 
standard form, 13 

inear—quadratic regulator, 216 

ockbox problem, 163 

Lorenz cone, see cone, second-order 

oss function, 193 


MAD, 181 
market completeness, 59 
Markowitz mean-variance, 91 
Markowitz mean-variance model, 8 
master problem, 177 
matrix 

inverse, 323 

positive semidefinite, 5, 280 
mean absolute deviation, see MAD 
mean-variance, 71, 296 

stochastic optimization, 175 
mean-variance model 

basic, 94 

general, 100 
mean-variance optimization model, 93 
minimum position constraints, 168 
mixed integer linear program, 140 
mixed integer optimization, 4 
mixed integer program, 140 
mixed integer programming, 6 
multi-period model, 4 
multiple-factor risk model, 109 


newsvendor problem, 173, 175, 177 
Newton step, 29, 286 

Newton’s method, 310 

nonlinear program, 305 
NP-hardness, 150 
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objective function, 3 resampled efficiency, 131 

one-fund separation theorem, 97 residual vector, 83 

optimal decision rule, 216 reward-to-risk ratio, see Sharpe ratio 

optimal policy, 216 Ricatti equation, 220 

optimal solution, 3 ridge regression, 86 

optimal value, 3 risk contribution, 134 

optimality conditions, 24 marginal, 134 

optimality cut, 178, 180 risk management, 9 

optimization risk measure, 6, 175, 181 
robust, 289 coherent, 184 

option risk-neutral probability measure, 55 
American, 241 robust portfolio optimization, 299 
European, 239 robustness, 289 

option pricing, 238 constraint, 290 

par, see principal value objective, 291 

pension fund, 263 relative, 292 

performance analysis, 112 sampling, 294 

portfolio saddle-point problem, 292 
benchmark, 103 scenario optimization, 176 
characteristic, 96 scenario tree, 248 
efficient, 93 arbitrage-free , 258 
equally weighted, 133 scenario trees 


factor mimicking, 111 
minimum risk, 95 
risk-parity, 134 


construction, 256 
second-order program, 277 
security selection, 95, 103 

tangency, 97 selection return, 114 

value-weighted, 133 semidefinite program 
portfolio management, 8 standard form, 281 
portfolio optimization sensitivity, 18, 38, 75 

dynamic, 198 separation theorem, see hyperplane 
positive linear pricing rule, 55 sequential system, 214 
primal problem, 21 shadow price, 17, 38 
principal, see principal value Sharpe ratio, 101, 116 
principal value, 42 
program 

semidefinite, 280 
pruning a node, 152 
pure integer linear program, 140 


shrinkage estimators, 129 
shrinkage factor, 129 
shrinkage procedure, 108 
shrinkage target, 129 


signal, 111 
quadratic program, 71 simplex method, 25 
dual, 76 dual, 25, 27 
primal, 76 single-factor risk model, 107 
standard form, 71 slack variable, 14 
quadratic programming, 5 Slater condition, 284 
sequential, 313 static model, 4 
quadratic programming model, see quadratic Stein paradox, 129 
program Stiemke’s theorem, 22, 34 
random sampling, 257 stochastic and dynamic optimization, 4 
adjusted, 257 stochastic discount factor, 56 
recourse, 7 stochastic model, 4 
recourse problem, 177 stochastic optimization, 6, 173, 174 
reduced cost, 25 two-stage with recourse, 6 
reduced gradient with recourse, 174 
generalized, 312 stochastic program 
regret, 292 linear two-stage, 175 
relaxation, 141, 145 stochastic programming, 248 


linear programming, 145 multi-stage, 248 


stochastic sequential decision problem, 221 


stochastic sequential system, 221 
strong duality theorem, 21 

style analysis, 113 
subdifferential, 311 

subgradient, 311 

subgradient method, 311 
support vector machine, 85 
surplus variable, 14 

synthetic option, 270 

synthetic option strategy, 270 


tangent subspace, 308 

term structure, 39 
implied, 38 

total cost of trading, 203 

tracking error, see active risk 

tractability, 4 

trade list, 202 

trading strategy 
execution, 202 

trading trajectory, 202 

treasury yield curve, 42 

tree fitting, 257 
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two-fund separation theorem, 96 
two-fund theorem, 115 


unbounded problem, 4 

uncertainty set, 289 
ellipsoidal, 290 

urgency, 206 

utility, 173 
logarithmic, 199 
power, 199 
quadratic, 93 


value at risk, see VaR 
conditional, see CVaR 
value-to-go function, 215 
VaR, 183 
a, 183 
volatility, 244 
volatility smile, 316 


volume-weighted average price (VWAP), 


204 
weak duality theorem, 21 
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winner selection problem, see combinatorial 


auction problem 


